A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. INN Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# this will help in making the Python code more structured automatically (good coding practice)
!pip install black[jupyter] --quiet
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter("ignore", ConvergenceWarning)
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
from matplotlib.ticker import MaxNLocator
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
# to compute VIF
from statsmodels.stats.outliers_influence import variance_inflation_factor
# to build linear regression_model using statsmodels
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
# to build linear regression_model
from sklearn.linear_model import LinearRegression
# to build logistic regression_model
from sklearn.linear_model import LogisticRegression
# to check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
#plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer
)
custom = {"axes.edgecolor": "purple", "grid.linestyle": "solid", "grid.color": "black"}
sns.set_style("dark", rc=custom)
#format numeric data for easier readability
pd.set_option("display.float_format", lambda x: "{:.2f}".format(x)) # to display numbers rounded off to 2 decimal places
%matplotlib inline
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# let colab access my google drive
from google.colab import drive
drive.mount("/content/drive")
Mounted at /content/drive
# Loading the dataset - sheet_name parameter is used if there are multiple tabs in the excel file.
df = pd.read_csv("/content/drive/MyDrive/Python_Course/Project_4/INNHotelsGroup.csv")
df.head()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | INN00002 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | INN00003 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | INN00004 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | INN00005 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
df.tail()
| Booking_ID | no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 36270 | INN36271 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | Not_Canceled |
| 36271 | INN36272 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | Canceled |
| 36272 | INN36273 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 36273 | INN36274 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 36274 | INN36275 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
df.shape
(36275, 19)
There are 36275 rows and 19 columns in the dataset.
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
There are 36275 rows and 19 columns in the data frame.
Booking_ID , type_of_meal_plan, room_type_reserved, market_segment_type, and booking_status are all objects. Should be updated to be categories.
no-of_adults, no_of_children, no-of_weekend_nights, no_of_week_nights, required_car_parking_space, lead_time, arrivaltime. arrival_month, arrival_date, repeated_guest, no_of_previous_calccellations, no_of_previous)bookings_not_cancelled, and no_of_special_requests are all integers.
avg_price_per_room is a float
Dependent variable is booking_status.
There is no missing data.
df.nunique()
Booking_ID 36275 no_of_adults 5 no_of_children 6 no_of_weekend_nights 8 no_of_week_nights 18 type_of_meal_plan 4 required_car_parking_space 2 room_type_reserved 7 lead_time 352 arrival_year 2 arrival_month 12 arrival_date 31 market_segment_type 5 repeated_guest 2 no_of_previous_cancellations 9 no_of_previous_bookings_not_canceled 59 avg_price_per_room 3930 no_of_special_requests 6 booking_status 2 dtype: int64
# Copy data to avoid any changes to original date
df2 = df.copy()
# coverting "objects" to "category" reduces the data space required to store the dataframe
# converting type_of_meal_plan, and market_segment_type into categorical data
#not converting Booking_ID as it increases the memory
#not converting booking_status as I will be converting that to 0 or 1
for col in ["type_of_meal_plan", "market_segment_type","booking_status", "room_type_reserved"]:
df2[col] = df2[col].astype("category")
# Use info() to print a concise summary of the DataFrame
df2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null category 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null category 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null category 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null category dtypes: category(4), float64(1), int64(13), object(1) memory usage: 4.3+ MB
# Convert the category columns into objects:
# Identify categorical columns
categorical_cols = df2.select_dtypes(['category']).columns
categorical_cols
# Convert categorical columns to object
df2[categorical_cols] = df2[categorical_cols].astype('object')
Index(['type_of_meal_plan', 'room_type_reserved', 'market_segment_type',
'booking_status'],
dtype='object')
df2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Booking_ID 36275 non-null object 1 no_of_adults 36275 non-null int64 2 no_of_children 36275 non-null int64 3 no_of_weekend_nights 36275 non-null int64 4 no_of_week_nights 36275 non-null int64 5 type_of_meal_plan 36275 non-null object 6 required_car_parking_space 36275 non-null int64 7 room_type_reserved 36275 non-null object 8 lead_time 36275 non-null int64 9 arrival_year 36275 non-null int64 10 arrival_month 36275 non-null int64 11 arrival_date 36275 non-null int64 12 market_segment_type 36275 non-null object 13 repeated_guest 36275 non-null int64 14 no_of_previous_cancellations 36275 non-null int64 15 no_of_previous_bookings_not_canceled 36275 non-null int64 16 avg_price_per_room 36275 non-null float64 17 no_of_special_requests 36275 non-null int64 18 booking_status 36275 non-null object dtypes: float64(1), int64(13), object(5) memory usage: 5.3+ MB
All variable type are now correct.
df2.duplicated().sum()
0
df2.isnull().sum()
Booking_ID 0 no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# checking for unique values
df2.nunique()
Booking_ID 36275 no_of_adults 5 no_of_children 6 no_of_weekend_nights 8 no_of_week_nights 18 type_of_meal_plan 4 required_car_parking_space 2 room_type_reserved 7 lead_time 352 arrival_year 2 arrival_month 12 arrival_date 31 market_segment_type 5 repeated_guest 2 no_of_previous_cancellations 9 no_of_previous_bookings_not_canceled 59 avg_price_per_room 3930 no_of_special_requests 6 booking_status 2 dtype: int64
df2.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 36275.00 | 1.84 | 0.52 | 0.00 | 2.00 | 2.00 | 2.00 | 4.00 |
| no_of_children | 36275.00 | 0.11 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 10.00 |
| no_of_weekend_nights | 36275.00 | 0.81 | 0.87 | 0.00 | 0.00 | 1.00 | 2.00 | 7.00 |
| no_of_week_nights | 36275.00 | 2.20 | 1.41 | 0.00 | 1.00 | 2.00 | 3.00 | 17.00 |
| required_car_parking_space | 36275.00 | 0.03 | 0.17 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
| lead_time | 36275.00 | 85.23 | 85.93 | 0.00 | 17.00 | 57.00 | 126.00 | 443.00 |
| arrival_year | 36275.00 | 2017.82 | 0.38 | 2017.00 | 2018.00 | 2018.00 | 2018.00 | 2018.00 |
| arrival_month | 36275.00 | 7.42 | 3.07 | 1.00 | 5.00 | 8.00 | 10.00 | 12.00 |
| arrival_date | 36275.00 | 15.60 | 8.74 | 1.00 | 8.00 | 16.00 | 23.00 | 31.00 |
| repeated_guest | 36275.00 | 0.03 | 0.16 | 0.00 | 0.00 | 0.00 | 0.00 | 1.00 |
| no_of_previous_cancellations | 36275.00 | 0.02 | 0.37 | 0.00 | 0.00 | 0.00 | 0.00 | 13.00 |
| no_of_previous_bookings_not_canceled | 36275.00 | 0.15 | 1.75 | 0.00 | 0.00 | 0.00 | 0.00 | 58.00 |
| avg_price_per_room | 36275.00 | 103.42 | 35.09 | 0.00 | 80.30 | 99.45 | 120.00 | 540.00 |
| no_of_special_requests | 36275.00 | 0.62 | 0.79 | 0.00 | 0.00 | 0.00 | 1.00 | 5.00 |
Average adults is between 1 and 2.
Most people do not bring children.
Average no_of_weekend_nights is less than 1.
Average no_of_week_nights is ~2
Lead_time is betwwen 0 and 443 days with the average being ~85 days.
arrival_year is between 2017 and 2018.
Leading Questions:
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(df2, feature, figsize=(12, 7), kde=False, bins=None):
"""
Creates a combined boxplot and histogram for a given feature in the dataset.
Args:
df2: The input dataframe.
feature (str): The column name for which to create the plot.
figsize (tuple, optional): Size of the figure (default: (12, 7)).
kde (bool, optional): Whether to show the density curve (default: False).
bins (int, optional): Number of bins for the histogram (default: None).
Returns:
None (displays the plot)
"""
fig, (ax_box, ax_hist) = plt.subplots(
nrows=2,
sharex=True,
figsize=figsize,
gridspec_kw={"height_ratios": (0.25, 0.75)},
)
# Boxplot
sns.boxplot(data=df2, x=feature, ax=ax_box, showmeans=True, color="#F72585")
# Histogram
if bins is None:
unique_values = df2[feature].unique()
bins = np.linspace(unique_values.min() - 1, unique_values.max() + 2, num=25)
sns.histplot(data=df2, x=feature, bins=bins, kde=True, ax=ax_hist)
# Add mean and median lines
ax_hist.axvline(df2[feature].mean(), color="purple", linestyle="--", label="Mean")
ax_hist.axvline(df2[feature].median(), color="blue", linestyle="-", label="Median")
# Label each bar with its count
for j, p in enumerate(ax_hist.patches):
ax_hist.annotate(
f"{int(p.get_height())}",
(p.get_x() + p.get_width() / 2.0, p.get_height()),
ha="center",
va="center",
xytext=(1, 10),
textcoords="offset points",
)
ax_hist.legend()
ax_hist.set_xlabel(feature)
ax_hist.set_ylabel("Frequency")
ax_hist.set_title(f"Frequency of {feature}")
plt.tight_layout()
# function to create labeled barplots
def labeled_barplot(df2, feature, perc=False, n=None):
"""
Barplot with percentage at the top
df2: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(df2[feature]) # length of the column
count = df2[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 10))
else:
plt.figure(figsize=(n + 1, 10))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=df2,
x=feature, # Assign the x variable to hue
palette="cubehelix", # Set the hue to the same variable
legend=False, # Disable the legend
order=df2[feature].value_counts().index[:n].sort_values(),
)
# Annotate each bar with its count and percentage
for p in ax.patches:
prc = "{:.1f}%".format(100.0 * p.get_height() / total) # percentage
cnt = p.get_height() # count
xx = p.get_x() + p.get_width() / 2 # x coordinate of bar percentage label
yy = p.get_height() # y coordinate of bar percentage label
# Annotate percentage
ax.annotate(
prc,
(xx, yy),
ha="center",
va="center",
style="italic",
size=12,
xytext=(0, 10),
textcoords="offset points",
)
# Annotate count (adjust vertical position)
ax.annotate(
cnt,
(xx, yy + 100),
ha="center",
va="bottom", # Adjusted to display above the percentage label
size=12,
xytext=(0, 20),
textcoords="offset points",
)
# Increase y-axis size by 500
plt.ylim(0, ax.get_ylim()[1] + 500)
def stacked_barplot(df2, predictor, target, palette=None):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
palette: list of colors (optional)
"""
count = df2[predictor].nunique()
sorter = df2[target].value_counts().index[-1]
# Use a custom palette or default to Matplotlib's default colors
if palette:
colors = palette
else:
# Default colors (you can replace these with your own)
colors = ["#06C2AC", "#9A0EEA", "#ED0DD9", "#0000BB", "#DC143C"]
#Colors are Teal, Violet, Fuchsia, Navy, and Crimson
tab1 = pd.crosstab(df2[predictor], df2[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(df2[predictor], df2[target], normalize="index").sort_values(
by=sorter, ascending=False
)
# Plot using the specified colors
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5), color=colors)
plt.legend(loc="lower left", frameon=False)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
### function to plot distributions wrt target
def distribution_plot_wrt_target(df2, predictor, target):
fig, axs = plt.subplots(2, 2, figsize=(15, 10))
target_uniq = df2[target].unique()
axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
sns.histplot(
data=df2[df2[target] == target_uniq[0]],
x=predictor,
kde=True,
ax=axs[0, 0],
color="aqua",
stat="density",
)
axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
sns.histplot(
data=df2[df2[target] == target_uniq[1]],
x=predictor,
kde=True,
ax=axs[0, 1],
color="indigo",
stat="density",
)
axs[1, 0].set_title("Boxplot w.r.t target")
sns.boxplot(data=df2, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")
axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
sns.boxplot(
data=df2,
x=target,
y=predictor,
ax=axs[1, 1],
showfliers=False,
palette="plasma",
)
plt.tight_layout()
plt.show()
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_adults")
# Set the x-axis label
plt.xlabel("No of Adults per Booking")
df2["no_of_adults"].value_counts()
print()
df2["no_of_adults"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'No of Adults per Booking')
no_of_adults 2 26108 1 7695 3 2317 0 139 4 16 Name: count, dtype: int64
count 36275.00 mean 1.84 std 0.52 min 0.00 25% 2.00 50% 2.00 75% 2.00 max 4.00 Name: no_of_adults, dtype: float64
<Figure size 2000x600 with 0 Axes>
Most of the bookings have 2 adults.
139 are showing zero adults. These should be researched further.
Most INN hotels do not allow kids under 16 to stay without an adult.
If we were supplied the age we would be able to determine if the 0 were correct or needed to be replace by the mean.
Without that additional information we will leave the 0s, they all are showing children staying. None are zero adults and zero children.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_children")
# Set the x-axis label
plt.xlabel("No of Children per Booking")
df2["no_of_children"].value_counts()
print()
df2["no_of_children"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'No of Children per Booking')
no_of_children 0 33577 1 1618 2 1058 3 19 9 2 10 1 Name: count, dtype: int64
count 36275.00 mean 0.11 std 0.40 min 0.00 25% 0.00 50% 0.00 75% 0.00 max 10.00 Name: no_of_children, dtype: float64
<Figure size 2000x600 with 0 Axes>
25%, 50%, and 75% are zero.
Most adults book without children.
The max amount of students is 10. The count of 8 to 10 are outlier that should be removed.
These outliers do not happen much so they could skew the results.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_weekend_nights")
# Set the x-axis label
plt.xlabel("No of Weekend Nights per Booking")
df2["no_of_weekend_nights"].value_counts()
print()
df2["no_of_weekend_nights"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'No of Weekend Nights per Booking')
no_of_weekend_nights 0 16872 1 9995 2 9071 3 153 4 129 5 34 6 20 7 1 Name: count, dtype: int64
count 36275.00 mean 0.81 std 0.87 min 0.00 25% 0.00 50% 1.00 75% 2.00 max 7.00 Name: no_of_weekend_nights, dtype: float64
<Figure size 2000x600 with 0 Axes>
Most of the bookings are between 0 and 2. With the majority being zero
Average bookings include 0 to 1 weekend night.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_week_nights")
# Set the x-axis label
plt.xlabel("No of Week Nights per Booking")
df2["no_of_week_nights"].value_counts()
print()
df2["no_of_week_nights"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'No of Week Nights per Booking')
no_of_week_nights 2 11444 1 9488 3 7839 4 2990 0 2387 5 1614 6 189 7 113 10 62 8 62 9 34 11 17 15 10 12 9 14 7 13 5 17 3 16 2 Name: count, dtype: int64
count 36275.00 mean 2.20 std 1.41 min 0.00 25% 1.00 50% 2.00 75% 3.00 max 17.00 Name: no_of_week_nights, dtype: float64
<Figure size 2000x600 with 0 Axes>
Max number of week nights is 17.
Average number of week nights is 2.
Most of the count of week nights is between O and 3.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "required_car_parking_space")
# Set the x-axis label
plt.xlabel("Required Car Parking Space per Booking")
df2["required_car_parking_space"].value_counts()
print()
df2["required_car_parking_space"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'Required Car Parking Space per Booking')
required_car_parking_space 0 35151 1 1124 Name: count, dtype: int64
count 36275.00 mean 0.03 std 0.17 min 0.00 25% 0.00 50% 0.00 75% 0.00 max 1.00 Name: required_car_parking_space, dtype: float64
<Figure size 2000x600 with 0 Axes>
Most people do not required a car parking space.
Out of all the bookings on 1124 asked for a parking space.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "lead_time")
# Set the x-axis label
plt.xlabel("Lead Time per Booking")
df2["lead_time"].value_counts()
print()
df2["lead_time"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'Lead Time per Booking')
lead_time
0 1297
1 1078
2 643
3 630
4 628
...
300 1
353 1
328 1
352 1
351 1
Name: count, Length: 352, dtype: int64
count 36275.00 mean 85.23 std 85.93 min 0.00 25% 17.00 50% 57.00 75% 126.00 max 443.00 Name: lead_time, dtype: float64
<Figure size 2000x600 with 0 Axes>
Average lead time is 85 days.
Most people book their rooms less than 400 days before their stay.
Over 25% of people book their stay 20 days or less before their stay.
Over 75% of people book their stay less than 1/2 year before their stay.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "arrival_year")
# Set the x-axis label
plt.xlabel("Arrival Year per Booking")
df2["arrival_year"].value_counts()
print()
df2["arrival_year"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'Arrival Year per Booking')
arrival_year 2018 29761 2017 6514 Name: count, dtype: int64
count 36275.00 mean 2017.82 std 0.38 min 2017.00 25% 2018.00 50% 2018.00 75% 2018.00 max 2018.00 Name: arrival_year, dtype: float64
<Figure size 2000x600 with 0 Axes>
Most booking took place in 2018.
18% of all booking were in 2017.
82% of all bookings were in 2018.
# Create a figure with a specified size
plt.figure(figsize=(20, 6))
# Plot the histogram and boxplot
histogram_boxplot(df2, "arrival_month")
# Set the x-axis label
plt.xlabel("Arrival Month per Booking")
df2["arrival_month"].value_counts()
print()
df2["arrival_month"].describe().T
<Figure size 2000x600 with 0 Axes>
Text(0.5, 47.722222222222285, 'Arrival Month per Booking')
arrival_month 10 5317 9 4611 8 3813 6 3203 12 3021 11 2980 7 2920 4 2736 5 2598 3 2358 2 1704 1 1014 Name: count, dtype: int64
count 36275.00 mean 7.42 std 3.07 min 1.00 25% 5.00 50% 8.00 75% 10.00 max 12.00 Name: arrival_month, dtype: float64
<Figure size 2000x600 with 0 Axes>
The most bookings took place in month 10 (October).
The fewest bookings took place in January.
Winter had 5,739 bookings. That accounts for 16% of the bookings.
Spring had 7,692 bookings. That accounts for 21% of the bookings.
Summer had 9,936 bookings. That accounts for 27% of the bookings.
Fall had 12,908 bookings. That accounts for 36% of the bookings.
#group by *arrival month*, count number of records per month, sort from most to fewest bookings, and show top 3 months
df2.groupby('arrival_month').count().sort_values(by='booking_status', ascending=False)['booking_status'].head(3)
arrival_month 10 5317 9 4611 8 3813 Name: booking_status, dtype: int64
1. What are the busiest months in the hotel.
The top 3 months for bookings are:
Fall (September - November) is the most popular season to book a hotel room.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "arrival_date")
# Set the x-axis label
plt.xlabel("Arrival Date per Booking")
df2["arrival_date"].value_counts()
print()
df2["arrival_date"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Arrival Date per Booking')
arrival_date 13 1358 17 1345 2 1331 4 1327 19 1327 16 1306 20 1281 15 1273 6 1273 18 1260 14 1242 30 1216 12 1204 8 1198 29 1190 21 1158 5 1154 26 1146 25 1146 1 1133 9 1130 28 1129 7 1110 24 1103 11 1098 3 1098 10 1089 27 1059 22 1023 23 990 31 578 Name: count, dtype: int64
count 36275.00 mean 15.60 std 8.74 min 1.00 25% 8.00 50% 16.00 75% 23.00 max 31.00 Name: arrival_date, dtype: float64
<Figure size 2500x1000 with 0 Axes>
11843 books have an arrival date of 1st - 10th of the month. ~33% of all bookings.
12694 books have an arrival date of 11th - 20th of the month. ~35% of all bookings.
11738 books have an arrival date of 21st - 31st of the month. ~32% of all bookings.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "repeated_guest")
# Set the x-axis label
plt.xlabel("Repeated Guest per Booking")
df2["repeated_guest"].value_counts()
print()
df2["repeated_guest"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Repeated Guest per Booking')
repeated_guest 0 35345 1 930 Name: count, dtype: int64
count 36275.00 mean 0.03 std 0.16 min 0.00 25% 0.00 50% 0.00 75% 0.00 max 1.00 Name: repeated_guest, dtype: float64
<Figure size 2500x1000 with 0 Axes>
Most bookings are not repeated guests.
Only 930 bookings are from repeated guests.
More ressearch should be done to determine why more guests are not booking additional stays with the hotels.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_previous_cancellations")
# Set the x-axis label
plt.xlabel("Number of Previous Cancellations per Booking")
df2["no_of_previous_cancellations"].value_counts()
print()
df2["no_of_previous_cancellations"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Number of Previous Cancellations per Booking')
no_of_previous_cancellations 0 35937 1 198 2 46 3 43 11 25 5 11 4 10 13 4 6 1 Name: count, dtype: int64
count 36275.00 mean 0.02 std 0.37 min 0.00 25% 0.00 50% 0.00 75% 0.00 max 13.00 Name: no_of_previous_cancellations, dtype: float64
<Figure size 2500x1000 with 0 Axes>
Most bookings are not cancelled.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_previous_bookings_not_canceled")
# Set the x-axis label
plt.xlabel("Number of Previous Bookings Not Canceled per Booking")
df2["no_of_previous_bookings_not_canceled"].value_counts()
print()
df2["no_of_previous_bookings_not_canceled"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Number of Previous Bookings Not Canceled per Booking')
no_of_previous_bookings_not_canceled 0 35463 1 228 2 112 3 80 4 65 5 60 6 36 7 24 8 23 10 19 9 19 11 15 12 12 14 9 15 8 16 7 13 7 18 6 20 6 21 6 17 6 19 6 22 6 25 3 27 3 24 3 23 3 44 2 29 2 48 2 28 2 30 2 32 2 31 2 26 2 46 1 55 1 45 1 57 1 53 1 54 1 58 1 41 1 40 1 43 1 35 1 50 1 56 1 33 1 37 1 42 1 51 1 38 1 34 1 39 1 52 1 49 1 47 1 36 1 Name: count, dtype: int64
count 36275.00 mean 0.15 std 1.75 min 0.00 25% 0.00 50% 0.00 75% 0.00 max 58.00 Name: no_of_previous_bookings_not_canceled, dtype: float64
<Figure size 2500x1000 with 0 Axes>
Per the graph most previous booking have not been cancelled.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "avg_price_per_room")
# Set the x-axis label
plt.xlabel("Average Price Per Room per Booking")
df2["avg_price_per_room"].value_counts()
print()
df2["avg_price_per_room"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Average Price Per Room per Booking')
avg_price_per_room
65.00 848
75.00 826
90.00 703
95.00 669
115.00 662
...
212.42 1
83.48 1
70.42 1
130.99 1
167.80 1
Name: count, Length: 3930, dtype: int64
count 36275.00 mean 103.42 std 35.09 min 0.00 25% 80.30 50% 99.45 75% 120.00 max 540.00 Name: avg_price_per_room, dtype: float64
<Figure size 2500x1000 with 0 Axes>
The room rate averages around 100.
The 627 that are showing a rate of 20 or less are free or discounted rooms.
Most rooms are less than 250.
75% of all rooms cost 120 or less.
# Create a figure with a specified size
plt.figure(figsize=(25, 10))
# Plot the histogram and boxplot
histogram_boxplot(df2, "no_of_special_requests")
# Set the x-axis label
plt.xlabel("Number of Special Requests per Booking")
df2["no_of_special_requests"].value_counts()
print()
df2["no_of_special_requests"].describe().T
<Figure size 2500x1000 with 0 Axes>
Text(0.5, 47.722222222222285, 'Number of Special Requests per Booking')
no_of_special_requests 0 19777 1 11373 2 4364 3 675 4 78 5 8 Name: count, dtype: int64
count 36275.00 mean 0.62 std 0.79 min 0.00 25% 0.00 50% 0.00 75% 1.00 max 5.00 Name: no_of_special_requests, dtype: float64
<Figure size 2500x1000 with 0 Axes>
Most bookings have 1 or less special requests.
The most special requests is 5.
# Labeled barplot for type of meal plan
labeled_barplot(df, "type_of_meal_plan", perc=True, n=25)
Most guests are picking meal plan 1. 76.7% of all guest chose this plan.
The next biggest group is the guests that chose not to have a meal plan. They make up 14.1% of all bookings.
# Labeled barplot for room type reserved
labeled_barplot(df, "room_type_reserved", perc=True, n=25)
Most guests are reserving Room Type 1. 77.5% of all guests reserved this type of room.
Next popular room type is Room Type 4. 16.7% of all guests reserved this type of room.
Room Type 3 is the least popular room. Only 7 guests booked this type of room.
# Labeled barplot for market segment
labeled_barplot(df, "market_segment_type", perc=True, n=25)
The most popular segment of our guests is online which accounts for 64% of all bookings.
Next popular segment is offline which accounts for 29% of all bookings.
Corporate guests only account for 5.6% of all bookings.
Complimentary or avaiation guests only account for 1.4% of all bookings.
df2.groupby('market_segment_type').count().sort_values(by='booking_status', ascending=False)['booking_status']
market_segment_type Online 23214 Offline 10528 Corporate 2017 Complementary 391 Aviation 125 Name: booking_status, dtype: int64
2. Which market segment do most of the guest come from?
# Labeled barplot for booking_status
labeled_barplot(df, "booking_status", perc=True, n=25)
11,885 bookings 32.8% are canceled.
24,390 bookings 67.2% are not canceled.
heatmap_list = df2.select_dtypes(include=np.number).columns.tolist()
# dropping release_year as it is a temporal variable.
plt.figure(figsize=(15, 7))
sns.heatmap(
df2[heatmap_list].corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="hsv"
)
plt.show()
<Figure size 1500x700 with 0 Axes>
<Axes: >
no_of_previous_bookings_not_canceled and repeated_guest have a 0.54 correlation.
Obviously no_of_previous_bookings_not_cancelled is related to no_of_previous_cancellations. They have a 47% rate
Market Segment compared to Avg Price Per Room
3. Hotel rooms are dynamic and change according to demand & customer demographics.
What is difference in room prices in different market segments.
df2.groupby('market_segment_type').agg({'avg_price_per_room':'mean'}).sort_values(by='avg_price_per_room',ascending=False).reset_index()
| market_segment_type | avg_price_per_room | |
|---|---|---|
| 0 | Online | 112.26 |
| 1 | Aviation | 100.70 |
| 2 | Offline | 91.63 |
| 3 | Corporate | 82.91 |
| 4 | Complementary | 3.14 |
Online has the highest average price at 112.26.
As expected Complimentarty is the lowest average price of 3.14. They average between 0 and 20 usually per booking.
Corporate rates are the next lowest at a average of 82.91.
df2['booking_status'].value_counts()
booking_status Not_Canceled 24390 Canceled 11885 Name: count, dtype: int64
4. What percent of bookings are canceled?
Out of all bookings, 11,885 bookings 32.8% are canceled.
24,390 bookings 67.2% are not canceled.
Repeated Guests compared to Booking Status
5. Repeating guest are the guest who stay in the hotel often and are important to brand equity.
What percent of repeating guests cancel?
df2.groupby('repeated_guest')['booking_status'].value_counts()
repeated_guest booking_status
0 Not_Canceled 23476
Canceled 11869
1 Not_Canceled 914
Canceled 16
Name: count, dtype: int64
stacked_barplot(df, "repeated_guest", "booking_status")
booking_status Canceled Not_Canceled All repeated_guest All 11885 24390 36275 0 11869 23476 35345 1 16 914 930 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "repeated_guest", "booking_status")
Repeated guests cancel less than guests who are not repeat guests.
Out of 930 booking by repeat guests only 16 cancelled. That is only 1.7% of all their bookings.
Bookings made by non repeat guests cancel at a rate of 33.6%.
No of Special Requests copared to Booking Status
6. Many guests have special requirements when booking a room.
Does these affect booking cancellations?
df2.groupby('no_of_special_requests')['booking_status'].value_counts()
no_of_special_requests booking_status
0 Not_Canceled 11232
Canceled 8545
1 Not_Canceled 8670
Canceled 2703
2 Not_Canceled 3727
Canceled 637
3 Not_Canceled 675
4 Not_Canceled 78
5 Not_Canceled 8
Name: count, dtype: int64
stacked_barplot(df2, "no_of_special_requests", "booking_status")
booking_status Canceled Not_Canceled All no_of_special_requests All 11885 24390 36275 0 8545 11232 19777 1 2703 8670 11373 2 637 3727 4364 3 0 675 675 4 0 78 78 5 0 8 8 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_special_requests", "booking_status")
The more the special request the less chance the guest will cancel.
THe guests that had 3 or more special requests did not cancel in bookings.
The guests that had no special requests canceled 43.2% of their bookings.
The guesta that had 1 or 2 special requests canceled 21.2% of their bookings.
Room Type Reserved compared to Type of Meal Plan Type
stacked_barplot(df2, "room_type_reserved", "type_of_meal_plan")
type_of_meal_plan Meal Plan 1 Meal Plan 2 Meal Plan 3 Not Selected All room_type_reserved All 27835 3305 5 5130 36275 Room_Type 7 152 2 3 1 158 Room_Type 1 20157 2934 1 5038 28130 Room_Type 4 5748 273 1 35 6057 Room_Type 2 653 16 0 23 692 Room_Type 3 5 0 0 2 7 Room_Type 5 242 14 0 9 265 Room_Type 6 878 66 0 22 966 ------------------------------------------------------------------------------------------------------------------------
Across all room types meal plan 1 is the most popular.
Second most popular is no meal plan (not selected).
Room Type Reserved compared to Market Segment Type
stacked_barplot(df2, "room_type_reserved", "market_segment_type")
market_segment_type Aviation Complementary Corporate Offline Online \ room_type_reserved All 125 391 2017 10528 23214 Room_Type 4 65 52 99 613 5228 Room_Type 1 60 247 1833 9747 16243 Room_Type 2 0 20 2 57 613 Room_Type 3 0 2 1 2 2 Room_Type 5 0 17 74 81 93 Room_Type 6 0 14 3 23 926 Room_Type 7 0 39 5 5 109 market_segment_type All room_type_reserved All 36275 Room_Type 4 6057 Room_Type 1 28130 Room_Type 2 692 Room_Type 3 7 Room_Type 5 265 Room_Type 6 966 Room_Type 7 158 ------------------------------------------------------------------------------------------------------------------------
Aviation guests always choose room type 4 and room type 1.
All rooms are mostly reserved by Online guests, with room type 4, room type 2 and room type 6 being the most popular for online guests.
Market Segement compared to repeated guest
stacked_barplot(df2, "repeated_guest", "market_segment_type")
market_segment_type Aviation Complementary Corporate Offline Online \ repeated_guest All 125 391 2017 10528 23214 0 109 265 1415 10438 23118 1 16 126 602 90 96 market_segment_type All repeated_guest All 36275 0 35345 1 930 ------------------------------------------------------------------------------------------------------------------------
Lead Time compared to Booking Status
stacked_barplot(df2, "lead_time", "booking_status")
booking_status Canceled Not_Canceled All lead_time All 11885 24390 36275 188 142 11 153 166 122 19 141 245 111 3 114 1 110 968 1078 ... ... ... ... 306 0 2 2 336 0 15 15 327 0 15 15 318 0 1 1 300 0 1 1 [353 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "lead_time", "booking_status")
The more the lead time the bugger chance of cancellation.
There are fewer bookings with long lead times.
df2.groupby('booking_status').agg({'lead_time':'mean'}).sort_values(by='lead_time',ascending=False).reset_index()
| booking_status | lead_time | |
|---|---|---|
| 0 | Canceled | 139.22 |
| 1 | Not_Canceled | 58.93 |
Average lead_time for cancellation is 139 days, whereas the average lead time for not_canceled bookings is ~59 days.
stacked_barplot(df2, "no_of_adults", "booking_status")
booking_status Canceled Not_Canceled All no_of_adults All 11885 24390 36275 2 9119 16989 26108 1 1856 5839 7695 3 863 1454 2317 0 44 95 139 4 3 13 16 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_adults", "booking_status")
The amount of adults does not seem to effect whether a guest cancels or not.
stacked_barplot(df2, "no_of_children", "booking_status")
booking_status Canceled Not_Canceled All no_of_children All 11885 24390 36275 0 10882 22695 33577 1 540 1078 1618 2 457 601 1058 3 5 14 19 9 1 1 2 10 0 1 1 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_children", "booking_status")
Except for the 10 children, the amount of children only minimally effects whether the guest cancels or not.
stacked_barplot(df2, "no_of_weekend_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_weekend_nights All 11885 24390 36275 0 5093 11779 16872 1 3432 6563 9995 2 3157 5914 9071 4 83 46 129 3 74 79 153 5 29 5 34 6 16 4 20 7 1 0 1 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_weekend_nights", "booking_status")
7 Weekend nights is completely canceled.
As the amount of weekend nights went down the cancellations went down.
stacked_barplot(df2, "no_of_week_nights", "booking_status")
booking_status Canceled Not_Canceled All no_of_week_nights All 11885 24390 36275 2 3997 7447 11444 3 2574 5265 7839 1 2572 6916 9488 4 1143 1847 2990 0 679 1708 2387 5 632 982 1614 6 88 101 189 10 53 9 62 7 52 61 113 8 32 30 62 9 21 13 34 11 14 3 17 15 8 2 10 12 7 2 9 13 5 0 5 14 4 3 7 16 2 0 2 17 2 1 3 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_week_nights", "booking_status")
16and 13 Week nights is completely canceled.
As the amount of week nights went down the cancellations went down.
stacked_barplot(df2, "required_car_parking_space", "booking_status")
booking_status Canceled Not_Canceled All required_car_parking_space All 11885 24390 36275 0 11771 23380 35151 1 114 1010 1124 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "required_car_parking_space", "booking_status")
Guests that required a parking space canceled less.
Guests who did not require a parking space canceled 33.5% of the time.
Guests who needed a parking space canceled 10.1% of the time.
stacked_barplot(df2, "arrival_year", "booking_status")
booking_status Canceled Not_Canceled All arrival_year All 11885 24390 36275 2018 10924 18837 29761 2017 961 5553 6514 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "arrival_year", "booking_status")
2018 had 36.7% cancel.
2017 had 14.7% cancel.
82% of all guests booked rooms for 2018.
stacked_barplot(df2, "arrival_month", "booking_status")
booking_status Canceled Not_Canceled All arrival_month All 11885 24390 36275 10 1880 3437 5317 9 1538 3073 4611 8 1488 2325 3813 7 1314 1606 2920 6 1291 1912 3203 4 995 1741 2736 5 948 1650 2598 11 875 2105 2980 3 700 1658 2358 2 430 1274 1704 12 402 2619 3021 1 24 990 1014 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "arrival_month", "booking_status")
Arrival month 10 - cancellations 35.36%
Arrival month 9 - cancellations 33.36%
Arrival month 8 - cancellations 39.02%
Arrival month 7 - cancellations 45.00%
Arrival month 6 - cancellations 40.31%
Arrival month 4 - cancellations 36.37%
Arrival month 5 - cancellations 36.49%
Arrival month 11 - cancellations 29.36%
Arrival month 3 - cancellations 29.69%
Arrival month 2 - cancellations 25.23%
Arrival month 12 - cancellations 13.31%
Arrival month 1 - cancellations 2.37%
stacked_barplot(df2, "arrival_date", "booking_status")
booking_status Canceled Not_Canceled All arrival_date All 11885 24390 36275 15 538 735 1273 4 474 853 1327 16 473 833 1306 30 465 751 1216 1 465 668 1133 12 460 744 1204 17 448 897 1345 6 444 829 1273 26 425 721 1146 19 413 914 1327 20 413 868 1281 13 408 950 1358 28 405 724 1129 3 403 695 1098 25 395 751 1146 21 376 782 1158 24 372 731 1103 18 366 894 1260 7 364 746 1110 8 356 842 1198 22 351 672 1023 23 341 649 990 29 334 856 1190 11 330 768 1098 5 328 826 1154 14 327 915 1242 10 318 771 1089 27 313 746 1059 2 308 1023 1331 9 294 836 1130 31 178 400 578 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "arrival_date", "booking_status")
Top 5 cancelation days:
Day 15 - 42.26%
Day 1 - 41.04%
Day 30 - 38.24%
Day 12 - 38.21%
Day 26 - 37.09%
Lowest 5 cancelation days:
Day 2 - 23.14%
Day 9 - 26.02%
Day 14 - 26.33%
Day 29 - 28.07%
Day 5 - 28.42%
stacked_barplot(df2, "no_of_previous_cancellations", "booking_status")
booking_status Canceled Not_Canceled All no_of_previous_cancellations All 11885 24390 36275 0 11869 24068 35937 1 11 187 198 13 4 0 4 3 1 42 43 2 0 46 46 4 0 10 10 5 0 11 11 6 0 1 1 11 0 25 25 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_previous_cancellations", "booking_status")
No of previous cancellations 11, 6, 5, 4 and 2 had 0 cancellations.
No of previous cancellations 13 had 100% cancelled.
No of previous cancellations 1 had 5.56% cancelled.
No of previous cancellations 0 had 33.03% cancelled.
stacked_barplot(df2, "no_of_previous_bookings_not_canceled", "booking_status")
booking_status Canceled Not_Canceled All no_of_previous_bookings_not_canceled All 11885 24390 36275 0 11878 23585 35463 1 4 224 228 12 1 11 12 4 1 64 65 6 1 35 36 2 0 112 112 44 0 2 2 43 0 1 1 42 0 1 1 41 0 1 1 40 0 1 1 38 0 1 1 39 0 1 1 46 0 1 1 37 0 1 1 36 0 1 1 35 0 1 1 45 0 1 1 48 0 2 2 47 0 1 1 33 0 1 1 49 0 1 1 50 0 1 1 51 0 1 1 52 0 1 1 53 0 1 1 54 0 1 1 55 0 1 1 56 0 1 1 57 0 1 1 58 0 1 1 34 0 1 1 31 0 2 2 32 0 2 2 3 0 80 80 5 0 60 60 7 0 24 24 8 0 23 23 9 0 19 19 10 0 19 19 11 0 15 15 13 0 7 7 14 0 9 9 15 0 8 8 16 0 7 7 17 0 6 6 18 0 6 6 19 0 6 6 20 0 6 6 21 0 6 6 22 0 6 6 23 0 3 3 24 0 3 3 25 0 3 3 26 0 2 2 27 0 3 3 28 0 2 2 29 0 2 2 30 0 2 2 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "no_of_previous_bookings_not_canceled", "booking_status")
All but 0, 1, 12, 4, and 6 had no cancellations.
0 previous bookings not canceled had 33.49% canceled.
1 previous bookings not canceled had 1.75% canceled.
12 previous bookings not canceled had 8.33% canceled.
4 previous bookings not canceled had 1.54% canceled.
6 previous bookings not canceled had 2.78% canceled.
df2.groupby('booking_status').agg({'avg_price_per_room':'mean'}).sort_values(by='avg_price_per_room',ascending=False).reset_index()
| booking_status | avg_price_per_room | |
|---|---|---|
| 0 | Canceled | 110.59 |
| 1 | Not_Canceled | 99.93 |
distribution_plot_wrt_target(df2, "avg_price_per_room", "booking_status")
Does not look like there is much correlation between booking status and average price per room.
Difference in average price per room is a difference of ~10.
plt.figure(figsize=(30, 6))
ax = sns.countplot(x='market_segment_type', data=df2, hue='booking_status', edgecolor='purple')
plt.xlabel('Market Segment')
plt.ylabel('Guest Count')
plt.title('Cancellation Status by Market Segment')
plt.ylim(0, 22000)
# Group by market segment
grouped_df = df2.groupby('market_segment_type')['booking_status'].value_counts()
# Calculate total bookings per segment
total_counts = df2.groupby('market_segment_type')['booking_status'].count()
# Calculate percentage for each booking status
for segment, status in grouped_df.index:
count = grouped_df.loc[segment, status]
total = total_counts.loc[segment]
percentage = (count / total) * 100
print(f"Segment: {segment}, Status: {status}, Percentage: {percentage:.1f}%")
# Annotate the bars (count and percentage)
for p in ax.patches:
cnt = p.get_height()
prc = "{:.1f}%".format(100.0 * p.get_height() / (df2.shape[0] )) # percentage
xx = p.get_x() + p.get_width() / 2
yy = p.get_height()
ax.annotate(f"{prc}", (xx, yy), ha="center", va="center", size=12, xytext=(0, 10), textcoords="offset points") # annotate percentage
ax.annotate(cnt, (xx, yy + 1000), ha="center", va="center", size=12, xytext=(0, 10), textcoords="offset points")
# Show the plot
plt.show()
<Figure size 3000x600 with 0 Axes>
Text(0.5, 0, 'Market Segment')
Text(0, 0.5, 'Guest Count')
Text(0.5, 1.0, 'Cancellation Status by Market Segment')
(0.0, 22000.0)
Segment: Aviation, Status: Not_Canceled, Percentage: 70.4% Segment: Aviation, Status: Canceled, Percentage: 29.6% Segment: Complementary, Status: Not_Canceled, Percentage: 100.0% Segment: Corporate, Status: Not_Canceled, Percentage: 89.1% Segment: Corporate, Status: Canceled, Percentage: 10.9% Segment: Offline, Status: Not_Canceled, Percentage: 70.1% Segment: Offline, Status: Canceled, Percentage: 29.9% Segment: Online, Status: Not_Canceled, Percentage: 63.5% Segment: Online, Status: Canceled, Percentage: 36.5%
Text(0, 10, '20.3%')
Text(0, 10, '7375.0')
Text(0, 10, '40.6%')
Text(0, 10, '14739.0')
Text(0, 10, '5.0%')
Text(0, 10, '1797.0')
Text(0, 10, '0.2%')
Text(0, 10, '88.0')
Text(0, 10, '1.1%')
Text(0, 10, '391.0')
Text(0, 10, '8.7%')
Text(0, 10, '3153.0')
Text(0, 10, '23.4%')
Text(0, 10, '8475.0')
Text(0, 10, '0.6%')
Text(0, 10, '220.0')
Text(0, 10, '0.1%')
Text(0, 10, '37.0')
Text(0, 10, '0.0%')
Text(0, 10, '0')
Text(0, 10, '0.0%')
Text(0, 10, '0')
distribution_plot_wrt_target(df2, "market_segment_type", "booking_status")
Online bookings account for 64% of all bookings. Of those 63% of the bookings are not canceled, whereas 37% are canceled.
Offline bookings account for 29% of all bookings. Of those 70% of the bookings are not canceled, whereas 30% are canceled.
Corporate bookings account for 5.6% of all bookings. Of those 89% of the bookings are not canceled, whereas 11% are canceled.
Complementary bookings account for 1.1% of all bookings. Of those 100% of the bookings are not canceled.
Aviation bookings account for 0.3% of all bookings. Of those 70% of the bookings are not canceled, whereas 30% are canceled.
Does not appear that whether or not it is a weekend or a weekday makes a difference when it comes to cancellations.
As a result converting to total nights and dropping no_of_weekend_nights and no-of week_nights.
# Make a copy in case there is any issues
df3 = df.copy()
df3['total_nights'] = df3['no_of_weekend_nights'] + df3['no_of_week_nights']
df3.drop(labels='no_of_weekend_nights', axis=1, inplace=True)
df3.drop(labels='no_of_week_nights', axis=1, inplace=True)
df3.drop(labels='Booking_ID', axis=1, inplace=True)
df2['total_nights'] = df2['no_of_weekend_nights'] + df2['no_of_week_nights']
df2.drop(labels='no_of_weekend_nights', axis=1, inplace=True)
df2.drop(labels='no_of_week_nights', axis=1, inplace=True)
df2.head()
| Booking_ID | no_of_adults | no_of_children | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | total_nights | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | INN00001 | 2 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled | 3 |
| 1 | INN00002 | 2 | 0 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled | 5 |
| 2 | INN00003 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled | 3 |
| 3 | INN00004 | 2 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled | 2 |
| 4 | INN00005 | 2 | 0 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled | 2 |
Booking_Id is not needed to do any modeling. Will not help with any comparisons.
#drop the column *Booking_ID* from the dataframe
df2.drop(labels='Booking_ID', axis=1, inplace=True)
df2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 type_of_meal_plan 36275 non-null object 3 required_car_parking_space 36275 non-null int64 4 room_type_reserved 36275 non-null object 5 lead_time 36275 non-null int64 6 arrival_year 36275 non-null int64 7 arrival_month 36275 non-null int64 8 arrival_date 36275 non-null int64 9 market_segment_type 36275 non-null object 10 repeated_guest 36275 non-null int64 11 no_of_previous_cancellations 36275 non-null int64 12 no_of_previous_bookings_not_canceled 36275 non-null int64 13 avg_price_per_room 36275 non-null float64 14 no_of_special_requests 36275 non-null int64 15 booking_status 36275 non-null object 16 total_nights 36275 non-null int64 dtypes: float64(1), int64(12), object(4) memory usage: 4.7+ MB
stacked_barplot(df2, "total_nights", "booking_status")
booking_status Canceled Not_Canceled All total_nights All 11885 24390 36275 3 3586 6466 10052 2 2899 5573 8472 4 1941 3952 5893 1 1466 5138 6604 5 823 1766 2589 6 465 566 1031 7 383 590 973 8 79 100 179 10 58 51 109 9 53 58 111 14 27 5 32 15 26 5 31 13 15 3 18 12 15 9 24 11 15 24 39 20 8 3 11 16 5 1 6 19 5 1 6 17 4 1 5 18 3 0 3 21 3 1 4 22 2 0 2 0 2 76 78 23 1 1 2 24 1 0 1 ------------------------------------------------------------------------------------------------------------------------
distribution_plot_wrt_target(df2, "total_nights", "booking_status")
As the amount of days increase the cancellation rate decreases.
# functions to treat outliers by flooring and capping
def treat_outliers(df2, col):
"""
Treats outliers in a variable
df2: dataframe
col: dataframe column
"""
Q1 = df2[col].quantile(0.25) # 25th quantile
Q3 = df2[col].quantile(0.75) # 75th quantile
IQR = Q3 - Q1
Lower_Whisker = Q1 - 1.5 * IQR
Upper_Whisker = Q3 + 1.5 * IQR
# all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
# all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
df2[col] = np.clip(df2[col], Lower_Whisker, Upper_Whisker)
return df2
def treat_outliers_all(df2, col_list):
"""
Treat outliers in a list of variables
df2: dataframe
col_list: list of dataframe columns
"""
for c in col_list:
df2 = treat_outliers(df2, c)
return df2
numeric_columns = df2.select_dtypes(include=np.number).columns.to_list()
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns):
plt.subplot(5, 4, i + 1)
plt.boxplot(df2[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
# Update color properties
boxprops = dict(color="red") # Change the box color to red
capprops = dict(color="blue") # Change the cap color to blue
whiskerprops = dict(color="purple") # Change the whisker color to purple
flierprops = dict(markerfacecolor="teal") # Change the flier marker color to teal
medianprops = dict(color="violet") # Change the median line color to violet
# Create the final combined graph
fig, ax = plt.subplots()
ax.set_title('Numerical Column Boxplots')
plt.boxplot(df2[variable], whis=1.5, boxprops=boxprops, capprops=capprops,
whiskerprops=whiskerprops, flierprops=flierprops, medianprops=medianprops)
# Toggle visibility of the entire figure
def toggle_plot(event):
plt.gcf().set_visible(not plt.gcf().get_visible())
plt.draw()
cid = plt.gcf().canvas.mpl_connect("key_press_event", toggle_plot)
plt.show()
<Figure size 2000x3000 with 0 Axes>
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf388765570>,
<matplotlib.lines.Line2D at 0x7cf388764a30>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388765cc0>,
<matplotlib.lines.Line2D at 0x7cf388764be0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf388766440>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388765d50>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf388766ad0>],
'means': []}
Text(0.5, 1.0, 'no_of_adults')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf3881699f0>,
<matplotlib.lines.Line2D at 0x7cf388168a30>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388168be0>,
<matplotlib.lines.Line2D at 0x7cf38816a830>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf38816b190>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388169780>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf38816b490>],
'means': []}
Text(0.5, 1.0, 'no_of_children')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf38243af20>,
<matplotlib.lines.Line2D at 0x7cf38243afe0>],
'caps': [<matplotlib.lines.Line2D at 0x7cf382439cc0>,
<matplotlib.lines.Line2D at 0x7cf38243bd60>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf382439f90>],
'medians': [<matplotlib.lines.Line2D at 0x7cf382439d50>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf38243aef0>],
'means': []}
Text(0.5, 1.0, 'required_car_parking_space')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf3814ddcc0>,
<matplotlib.lines.Line2D at 0x7cf3814dec80>],
'caps': [<matplotlib.lines.Line2D at 0x7cf3814dfb20>,
<matplotlib.lines.Line2D at 0x7cf3814deda0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf3881bdf90>],
'medians': [<matplotlib.lines.Line2D at 0x7cf3814de800>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3814dee60>],
'means': []}
Text(0.5, 1.0, 'lead_time')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf38887cca0>,
<matplotlib.lines.Line2D at 0x7cf38887e740>],
'caps': [<matplotlib.lines.Line2D at 0x7cf38887c340>,
<matplotlib.lines.Line2D at 0x7cf3822e60e0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf38887c760>],
'medians': [<matplotlib.lines.Line2D at 0x7cf3822e6350>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3822e72e0>],
'means': []}
Text(0.5, 1.0, 'arrival_year')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf388a20970>,
<matplotlib.lines.Line2D at 0x7cf388a23040>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388a21f60>,
<matplotlib.lines.Line2D at 0x7cf388a210f0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf3884acca0>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388a22c50>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf388a206a0>],
'means': []}
Text(0.5, 1.0, 'arrival_month')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf3884249d0>,
<matplotlib.lines.Line2D at 0x7cf388425210>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388427f40>,
<matplotlib.lines.Line2D at 0x7cf388425030>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf388425060>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388427c70>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3884264d0>],
'means': []}
Text(0.5, 1.0, 'arrival_date')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf382315f30>,
<matplotlib.lines.Line2D at 0x7cf382314610>],
'caps': [<matplotlib.lines.Line2D at 0x7cf3823168c0>,
<matplotlib.lines.Line2D at 0x7cf3823170a0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf3823168f0>],
'medians': [<matplotlib.lines.Line2D at 0x7cf382317640>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3823148e0>],
'means': []}
Text(0.5, 1.0, 'repeated_guest')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf3813fe8c0>,
<matplotlib.lines.Line2D at 0x7cf3813ff9d0>],
'caps': [<matplotlib.lines.Line2D at 0x7cf3813ffa30>,
<matplotlib.lines.Line2D at 0x7cf3813ffd60>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf3813fe6b0>],
'medians': [<matplotlib.lines.Line2D at 0x7cf3813ffe20>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3813ff550>],
'means': []}
Text(0.5, 1.0, 'no_of_previous_cancellations')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf38873e350>,
<matplotlib.lines.Line2D at 0x7cf38873dbd0>],
'caps': [<matplotlib.lines.Line2D at 0x7cf38873d5d0>,
<matplotlib.lines.Line2D at 0x7cf38873cc70>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf38873f5e0>],
'medians': [<matplotlib.lines.Line2D at 0x7cf38873e740>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf38873c250>],
'means': []}
Text(0.5, 1.0, 'no_of_previous_bookings_not_canceled')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf381f75c00>,
<matplotlib.lines.Line2D at 0x7cf38218d390>],
'caps': [<matplotlib.lines.Line2D at 0x7cf38218dba0>,
<matplotlib.lines.Line2D at 0x7cf38218e050>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf381f75660>],
'medians': [<matplotlib.lines.Line2D at 0x7cf38218ca30>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf38218fc10>],
'means': []}
Text(0.5, 1.0, 'avg_price_per_room')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf388a71cf0>,
<matplotlib.lines.Line2D at 0x7cf388a71a20>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388a70df0>,
<matplotlib.lines.Line2D at 0x7cf388a72ef0>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf388a732e0>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388a717b0>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf388a70f70>],
'means': []}
Text(0.5, 1.0, 'no_of_special_requests')
<Axes: >
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf388860400>,
<matplotlib.lines.Line2D at 0x7cf388862bf0>],
'caps': [<matplotlib.lines.Line2D at 0x7cf388861150>,
<matplotlib.lines.Line2D at 0x7cf388860640>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf388860790>],
'medians': [<matplotlib.lines.Line2D at 0x7cf388862e00>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf3818250f0>],
'means': []}
Text(0.5, 1.0, 'total_nights')
Text(0.5, 1.0, 'Numerical Column Boxplots')
{'whiskers': [<matplotlib.lines.Line2D at 0x7cf3814441f0>,
<matplotlib.lines.Line2D at 0x7cf3814471f0>],
'caps': [<matplotlib.lines.Line2D at 0x7cf381445450>,
<matplotlib.lines.Line2D at 0x7cf381447e50>],
'boxes': [<matplotlib.lines.Line2D at 0x7cf381446470>],
'medians': [<matplotlib.lines.Line2D at 0x7cf381444910>],
'fliers': [<matplotlib.lines.Line2D at 0x7cf381447ca0>],
'means': []}
# Checking the distrinbution of all numeric columns using histplot.
plt.figure(figsize=(15, 45))
for i in range(len(numeric_columns)):
plt.subplot(12, 3, i + 1)
plt.hist(df2[numeric_columns[i]], bins=50,color = "teal")
plt.tight_layout()
plt.title(numeric_columns[i], fontsize=25)
plt.show(),
<Figure size 1500x4500 with 0 Axes>
<Axes: >
(array([1.3900e+02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 7.6950e+03, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
2.6108e+04, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 2.3170e+03, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.6000e+01]),
array([0. , 0.08, 0.16, 0.24, 0.32, 0.4 , 0.48, 0.56, 0.64, 0.72, 0.8 ,
0.88, 0.96, 1.04, 1.12, 1.2 , 1.28, 1.36, 1.44, 1.52, 1.6 , 1.68,
1.76, 1.84, 1.92, 2. , 2.08, 2.16, 2.24, 2.32, 2.4 , 2.48, 2.56,
2.64, 2.72, 2.8 , 2.88, 2.96, 3.04, 3.12, 3.2 , 3.28, 3.36, 3.44,
3.52, 3.6 , 3.68, 3.76, 3.84, 3.92, 4. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'no_of_adults')
<Axes: >
(array([3.3577e+04, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.6180e+03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.0580e+03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.9000e+01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
2.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]),
array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ,
2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4, 3.6, 3.8, 4. , 4.2,
4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. , 6.2, 6.4,
6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6,
8.8, 9. , 9.2, 9.4, 9.6, 9.8, 10. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'no_of_children')
<Axes: >
(array([35151., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 1124.]),
array([0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ,
0.22, 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34, 0.36, 0.38, 0.4 , 0.42,
0.44, 0.46, 0.48, 0.5 , 0.52, 0.54, 0.56, 0.58, 0.6 , 0.62, 0.64,
0.66, 0.68, 0.7 , 0.72, 0.74, 0.76, 0.78, 0.8 , 0.82, 0.84, 0.86,
0.88, 0.9 , 0.92, 0.94, 0.96, 0.98, 1. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'required_car_parking_space')
<Axes: >
(array([6.237e+03, 2.989e+03, 2.198e+03, 2.253e+03, 2.126e+03, 1.505e+03,
1.715e+03, 1.303e+03, 1.427e+03, 1.263e+03, 9.910e+02, 1.240e+03,
9.040e+02, 9.360e+02, 5.840e+02, 6.470e+02, 6.760e+02, 6.080e+02,
7.250e+02, 4.240e+02, 5.840e+02, 5.830e+02, 2.970e+02, 4.030e+02,
3.860e+02, 2.180e+02, 1.730e+02, 2.870e+02, 2.370e+02, 2.940e+02,
3.050e+02, 3.320e+02, 2.350e+02, 1.560e+02, 2.860e+02, 1.290e+02,
1.620e+02, 5.700e+01, 2.500e+01, 1.100e+02, 2.200e+01, 1.000e+00,
6.900e+01, 7.100e+01, 0.000e+00, 0.000e+00, 0.000e+00, 6.000e+01,
2.000e+01, 2.200e+01]),
array([ 0. , 8.86, 17.72, 26.58, 35.44, 44.3 , 53.16, 62.02,
70.88, 79.74, 88.6 , 97.46, 106.32, 115.18, 124.04, 132.9 ,
141.76, 150.62, 159.48, 168.34, 177.2 , 186.06, 194.92, 203.78,
212.64, 221.5 , 230.36, 239.22, 248.08, 256.94, 265.8 , 274.66,
283.52, 292.38, 301.24, 310.1 , 318.96, 327.82, 336.68, 345.54,
354.4 , 363.26, 372.12, 380.98, 389.84, 398.7 , 407.56, 416.42,
425.28, 434.14, 443. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'lead_time')
<Axes: >
(array([ 6514., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 29761.]),
array([2017. , 2017.02, 2017.04, 2017.06, 2017.08, 2017.1 , 2017.12,
2017.14, 2017.16, 2017.18, 2017.2 , 2017.22, 2017.24, 2017.26,
2017.28, 2017.3 , 2017.32, 2017.34, 2017.36, 2017.38, 2017.4 ,
2017.42, 2017.44, 2017.46, 2017.48, 2017.5 , 2017.52, 2017.54,
2017.56, 2017.58, 2017.6 , 2017.62, 2017.64, 2017.66, 2017.68,
2017.7 , 2017.72, 2017.74, 2017.76, 2017.78, 2017.8 , 2017.82,
2017.84, 2017.86, 2017.88, 2017.9 , 2017.92, 2017.94, 2017.96,
2017.98, 2018. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'arrival_year')
<Axes: >
(array([1014., 0., 0., 0., 1704., 0., 0., 0., 0.,
2358., 0., 0., 0., 2736., 0., 0., 0., 0.,
2598., 0., 0., 0., 3203., 0., 0., 0., 0.,
2920., 0., 0., 0., 3813., 0., 0., 0., 0.,
4611., 0., 0., 0., 5317., 0., 0., 0., 0.,
2980., 0., 0., 0., 3021.]),
array([ 1. , 1.22, 1.44, 1.66, 1.88, 2.1 , 2.32, 2.54, 2.76,
2.98, 3.2 , 3.42, 3.64, 3.86, 4.08, 4.3 , 4.52, 4.74,
4.96, 5.18, 5.4 , 5.62, 5.84, 6.06, 6.28, 6.5 , 6.72,
6.94, 7.16, 7.38, 7.6 , 7.82, 8.04, 8.26, 8.48, 8.7 ,
8.92, 9.14, 9.36, 9.58, 9.8 , 10.02, 10.24, 10.46, 10.68,
10.9 , 11.12, 11.34, 11.56, 11.78, 12. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'arrival_month')
<Axes: >
(array([1133., 1331., 0., 1098., 0., 1327., 1154., 0., 1273.,
0., 1110., 1198., 0., 1130., 0., 1089., 1098., 0.,
1204., 0., 1358., 1242., 0., 1273., 0., 1306., 1345.,
0., 1260., 0., 1327., 1281., 0., 1158., 0., 1023.,
990., 0., 1103., 0., 1146., 1146., 0., 1059., 0.,
1129., 1190., 0., 1216., 578.]),
array([ 1. , 1.6, 2.2, 2.8, 3.4, 4. , 4.6, 5.2, 5.8, 6.4, 7. ,
7.6, 8.2, 8.8, 9.4, 10. , 10.6, 11.2, 11.8, 12.4, 13. , 13.6,
14.2, 14.8, 15.4, 16. , 16.6, 17.2, 17.8, 18.4, 19. , 19.6, 20.2,
20.8, 21.4, 22. , 22.6, 23.2, 23.8, 24.4, 25. , 25.6, 26.2, 26.8,
27.4, 28. , 28.6, 29.2, 29.8, 30.4, 31. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'arrival_date')
<Axes: >
(array([35345., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 930.]),
array([0. , 0.02, 0.04, 0.06, 0.08, 0.1 , 0.12, 0.14, 0.16, 0.18, 0.2 ,
0.22, 0.24, 0.26, 0.28, 0.3 , 0.32, 0.34, 0.36, 0.38, 0.4 , 0.42,
0.44, 0.46, 0.48, 0.5 , 0.52, 0.54, 0.56, 0.58, 0.6 , 0.62, 0.64,
0.66, 0.68, 0.7 , 0.72, 0.74, 0.76, 0.78, 0.8 , 0.82, 0.84, 0.86,
0.88, 0.9 , 0.92, 0.94, 0.96, 0.98, 1. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'repeated_guest')
<Axes: >
(array([3.5937e+04, 0.0000e+00, 0.0000e+00, 1.9800e+02, 0.0000e+00,
0.0000e+00, 0.0000e+00, 4.6000e+01, 0.0000e+00, 0.0000e+00,
0.0000e+00, 4.3000e+01, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.0000e+01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.1000e+01,
0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 2.5000e+01, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 4.0000e+00]),
array([ 0. , 0.26, 0.52, 0.78, 1.04, 1.3 , 1.56, 1.82, 2.08,
2.34, 2.6 , 2.86, 3.12, 3.38, 3.64, 3.9 , 4.16, 4.42,
4.68, 4.94, 5.2 , 5.46, 5.72, 5.98, 6.24, 6.5 , 6.76,
7.02, 7.28, 7.54, 7.8 , 8.06, 8.32, 8.58, 8.84, 9.1 ,
9.36, 9.62, 9.88, 10.14, 10.4 , 10.66, 10.92, 11.18, 11.44,
11.7 , 11.96, 12.22, 12.48, 12.74, 13. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'no_of_previous_cancellations')
<Axes: >
(array([3.5691e+04, 1.1200e+02, 8.0000e+01, 6.5000e+01, 6.0000e+01,
3.6000e+01, 4.7000e+01, 1.9000e+01, 1.9000e+01, 1.5000e+01,
1.2000e+01, 7.0000e+00, 1.7000e+01, 7.0000e+00, 6.0000e+00,
6.0000e+00, 6.0000e+00, 6.0000e+00, 1.2000e+01, 3.0000e+00,
3.0000e+00, 3.0000e+00, 2.0000e+00, 3.0000e+00, 2.0000e+00,
4.0000e+00, 2.0000e+00, 2.0000e+00, 1.0000e+00, 1.0000e+00,
1.0000e+00, 2.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00,
1.0000e+00, 1.0000e+00, 3.0000e+00, 1.0000e+00, 1.0000e+00,
1.0000e+00, 2.0000e+00, 1.0000e+00, 2.0000e+00, 1.0000e+00,
1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00, 2.0000e+00]),
array([ 0. , 1.16, 2.32, 3.48, 4.64, 5.8 , 6.96, 8.12, 9.28,
10.44, 11.6 , 12.76, 13.92, 15.08, 16.24, 17.4 , 18.56, 19.72,
20.88, 22.04, 23.2 , 24.36, 25.52, 26.68, 27.84, 29. , 30.16,
31.32, 32.48, 33.64, 34.8 , 35.96, 37.12, 38.28, 39.44, 40.6 ,
41.76, 42.92, 44.08, 45.24, 46.4 , 47.56, 48.72, 49.88, 51.04,
52.2 , 53.36, 54.52, 55.68, 56.84, 58. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'no_of_previous_bookings_not_canceled')
<Axes: >
(array([5.980e+02, 2.900e+01, 1.400e+01, 9.700e+01, 3.430e+02, 1.234e+03,
4.538e+03, 4.626e+03, 5.922e+03, 4.325e+03, 4.538e+03, 2.968e+03,
2.485e+03, 1.596e+03, 9.830e+02, 6.270e+02, 3.880e+02, 3.070e+02,
2.270e+02, 1.480e+02, 1.190e+02, 6.600e+01, 2.600e+01, 2.400e+01,
1.600e+01, 7.000e+00, 6.000e+00, 9.000e+00, 2.000e+00, 2.000e+00,
1.000e+00, 0.000e+00, 1.000e+00, 1.000e+00, 1.000e+00, 0.000e+00,
0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,
0.000e+00, 1.000e+00]),
array([ 0. , 10.8, 21.6, 32.4, 43.2, 54. , 64.8, 75.6, 86.4,
97.2, 108. , 118.8, 129.6, 140.4, 151.2, 162. , 172.8, 183.6,
194.4, 205.2, 216. , 226.8, 237.6, 248.4, 259.2, 270. , 280.8,
291.6, 302.4, 313.2, 324. , 334.8, 345.6, 356.4, 367.2, 378. ,
388.8, 399.6, 410.4, 421.2, 432. , 442.8, 453.6, 464.4, 475.2,
486. , 496.8, 507.6, 518.4, 529.2, 540. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'avg_price_per_room')
<Axes: >
(array([1.9777e+04, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
1.1373e+04, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
4.3640e+03, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
6.7500e+02, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
7.8000e+01, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00,
0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 8.0000e+00]),
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 4. , 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 5. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'no_of_special_requests')
<Axes: >
(array([7.8000e+01, 0.0000e+00, 6.6040e+03, 0.0000e+00, 8.4720e+03,
0.0000e+00, 1.0052e+04, 0.0000e+00, 5.8930e+03, 0.0000e+00,
2.5890e+03, 0.0000e+00, 1.0310e+03, 0.0000e+00, 9.7300e+02,
0.0000e+00, 1.7900e+02, 0.0000e+00, 1.1100e+02, 0.0000e+00,
1.0900e+02, 0.0000e+00, 3.9000e+01, 0.0000e+00, 0.0000e+00,
2.4000e+01, 0.0000e+00, 1.8000e+01, 0.0000e+00, 3.2000e+01,
0.0000e+00, 3.1000e+01, 0.0000e+00, 6.0000e+00, 0.0000e+00,
5.0000e+00, 0.0000e+00, 3.0000e+00, 0.0000e+00, 6.0000e+00,
0.0000e+00, 1.1000e+01, 0.0000e+00, 4.0000e+00, 0.0000e+00,
2.0000e+00, 0.0000e+00, 2.0000e+00, 0.0000e+00, 1.0000e+00]),
array([ 0. , 0.48, 0.96, 1.44, 1.92, 2.4 , 2.88, 3.36, 3.84,
4.32, 4.8 , 5.28, 5.76, 6.24, 6.72, 7.2 , 7.68, 8.16,
8.64, 9.12, 9.6 , 10.08, 10.56, 11.04, 11.52, 12. , 12.48,
12.96, 13.44, 13.92, 14.4 , 14.88, 15.36, 15.84, 16.32, 16.8 ,
17.28, 17.76, 18.24, 18.72, 19.2 , 19.68, 20.16, 20.64, 21.12,
21.6 , 22.08, 22.56, 23.04, 23.52, 24. ]),
<BarContainer object of 50 artists>)
Text(0.5, 1.0, 'total_nights')
(None,)
#calculate interquartile range for average room price, 120 (75%), 80.30 (25%)
IQR = (120-80.30)
#create dataframes of rooms sold for no price (free), rooms sold for a low outlier average room price, and rooms sold for a high outlier average room price
df2_0 = df2[df2.avg_price_per_room == 0]
df2_low = df2[df2.avg_price_per_room < 99.45-1.5*IQR]
df2_high = df2[df2.avg_price_per_room > 99.45+1.5*IQR]
#shows the room price of zero
for colname in df2_0.dtypes[df2.dtypes == 'category'].index:
print(df2_0[colname].value_counts(dropna=False))
print(" ")
#shows the room price that is a low outlier
for colname in df2_low.dtypes[df2.dtypes == 'category'].index:
print(df2_low[colname].value_counts(dropna=False))
print(" ")
#shows the room price that is a low outlier
for colname in df2_high.dtypes[df2.dtypes == 'category'].index:
print(df2_high[colname].value_counts(dropna=False))
print(" ")
# Labeled barplot for type of meal plan
labeled_barplot(df2_high, "type_of_meal_plan", perc=True, n=10)
print()
labeled_barplot(df2_high, "room_type_reserved", perc=True, n=10)
print()
labeled_barplot(df2_high, "market_segment_type", perc=True, n=10)
print()
labeled_barplot(df2_high, "booking_status", perc=True, n=10)
There are 2301 high outliers.
82% of the high outliers chose meal plan 1.
35.2% of high outliers chose room type 4.
Most popular room types are room type 4, room type 6 and room type 1.
91.7% of the high outliers reserved online.
63.2% of the high outliers were not canceled, which means 36.8% of the bookings were canceled.
545 rooms were sold at no cost to the guests. Only 1% of those bookings were canceled.
686 rooms were sold at a low outlier amount to the guests. Only 3% of those bookings were canceled.
# compute adjusted R-squared
def adj_r2_score(predictors, targets, predictions):
r2 = r2_score(targets, predictions)
n = predictors.shape[0]
k = predictors.shape[1]
return 1 - ((1 - r2) * (n - 1) / (n - k - 1))
# compute MAPE
def mape_score(targets, predictions):
return np.mean(np.abs(targets - predictions) / targets) * 100
# compute multiple metrics to check performance of a regression model
def model_performance_regression(model, predictors, target):
"""
Function to compute different metrics to check regression model performance
model: regressor
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
r2 = r2_score(target, pred) # to compute R-squared
adjr2 = adj_r2_score(predictors, target, pred) # to compute adjusted R-squared
rmse = np.sqrt(mean_squared_error(target, pred)) # to compute RMSE
mae = mean_absolute_error(target, pred) # to compute MAE
mape = mape_score(target, pred) # to compute MAPE
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{
"RMSE": rmse,
"MAE": mae,
"R-squared": r2,
"Adj. R-squared": adjr2,
"MAPE": mape,
},
index=[0],
)
return df_perf
Encoding Not Canceled as 0 and Canceled as 1, The hotel wants to be able to predict customers that might cancel their booking
df2["booking_status"] = df2["booking_status"].apply(lambda x: 1 if x == "Canceled" else 0)
df2["booking_status"].value_counts()
booking_status 0 24390 1 11885 Name: count, dtype: int64
df2.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 type_of_meal_plan 36275 non-null object 3 required_car_parking_space 36275 non-null int64 4 room_type_reserved 36275 non-null object 5 lead_time 36275 non-null int64 6 arrival_year 36275 non-null int64 7 arrival_month 36275 non-null int64 8 arrival_date 36275 non-null int64 9 market_segment_type 36275 non-null object 10 repeated_guest 36275 non-null int64 11 no_of_previous_cancellations 36275 non-null int64 12 no_of_previous_bookings_not_canceled 36275 non-null int64 13 avg_price_per_room 36275 non-null float64 14 no_of_special_requests 36275 non-null int64 15 booking_status 36275 non-null int64 16 total_nights 36275 non-null int64 dtypes: float64(1), int64(13), object(3) memory usage: 4.7+ MB
Convert Categorical to Numerical Values
for colname in df2.dtypes[df2.dtypes == 'category'].index:
print(df2[colname].value_counts(dropna=False))
print(" ")
Spliting the Data
X = df2.drop('booking_status',axis=1) # Predictor feature columns (8 X m)
Y = df2['booking_status'] # Predicted class (1=True, 0=False) (1 X m)
Y.info()
<class 'pandas.core.series.Series'> RangeIndex: 36275 entries, 0 to 36274 Series name: booking_status Non-Null Count Dtype -------------- ----- 36275 non-null int64 dtypes: int64(1) memory usage: 283.5 KB
# Identify object-type columns
object_cols = X.select_dtypes(include=['object','category']).columns
# Convert object-type columns to dummy variables
X = pd.get_dummies(X, columns=object_cols, dtype=int, drop_first=True) # Drop the first category to avoid multicollinearity
# Ensure te output is integer (numeric 0 and 1) instead of Boolean
X.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 required_car_parking_space 36275 non-null int64 3 lead_time 36275 non-null int64 4 arrival_year 36275 non-null int64 5 arrival_month 36275 non-null int64 6 arrival_date 36275 non-null int64 7 repeated_guest 36275 non-null int64 8 no_of_previous_cancellations 36275 non-null int64 9 no_of_previous_bookings_not_canceled 36275 non-null int64 10 avg_price_per_room 36275 non-null float64 11 no_of_special_requests 36275 non-null int64 12 total_nights 36275 non-null int64 13 type_of_meal_plan_Meal Plan 2 36275 non-null int64 14 type_of_meal_plan_Meal Plan 3 36275 non-null int64 15 type_of_meal_plan_Not Selected 36275 non-null int64 16 room_type_reserved_Room_Type 2 36275 non-null int64 17 room_type_reserved_Room_Type 3 36275 non-null int64 18 room_type_reserved_Room_Type 4 36275 non-null int64 19 room_type_reserved_Room_Type 5 36275 non-null int64 20 room_type_reserved_Room_Type 6 36275 non-null int64 21 room_type_reserved_Room_Type 7 36275 non-null int64 22 market_segment_type_Complementary 36275 non-null int64 23 market_segment_type_Corporate 36275 non-null int64 24 market_segment_type_Offline 36275 non-null int64 25 market_segment_type_Online 36275 non-null int64 dtypes: float64(1), int64(25) memory usage: 7.2 MB
X.head()
| no_of_adults | no_of_children | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | total_nights | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 0 | 224 | 2017 | 10 | 2 | 0 | 0 | 0 | 65.00 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 0 | 5 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68 | 1 | 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 0 | 0 | 1 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 211 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 2 | 0 | 0 | 48 | 2018 | 4 | 11 | 0 | 0 | 0 | 94.50 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1, stratify=Y
)
# Display the first few rows of X_train
print(X_train.head())
no_of_adults no_of_children required_car_parking_space lead_time \
6870 2 0 0 5
531 2 1 0 86
3394 1 0 0 105
23540 1 0 0 85
15302 2 0 0 309
arrival_year arrival_month arrival_date repeated_guest \
6870 2018 12 30 0
531 2018 12 8 0
3394 2018 5 5 0
23540 2018 12 3 0
15302 2018 5 13 0
no_of_previous_cancellations no_of_previous_bookings_not_canceled \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 0
avg_price_per_room no_of_special_requests total_nights \
6870 116.00 1 5
531 122.00 0 3
3394 117.30 0 3
23540 98.00 0 2
15302 101.00 0 3
type_of_meal_plan_Meal Plan 2 type_of_meal_plan_Meal Plan 3 \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 1 0
type_of_meal_plan_Not Selected room_type_reserved_Room_Type 2 \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 0
room_type_reserved_Room_Type 3 room_type_reserved_Room_Type 4 \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 0
room_type_reserved_Room_Type 5 room_type_reserved_Room_Type 6 \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 0
room_type_reserved_Room_Type 7 market_segment_type_Complementary \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 0
market_segment_type_Corporate market_segment_type_Offline \
6870 0 0
531 0 0
3394 0 0
23540 0 0
15302 0 1
market_segment_type_Online
6870 1
531 1
3394 1
23540 1
15302 0
X.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 36275 entries, 0 to 36274 Data columns (total 26 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 36275 non-null int64 1 no_of_children 36275 non-null int64 2 required_car_parking_space 36275 non-null int64 3 lead_time 36275 non-null int64 4 arrival_year 36275 non-null int64 5 arrival_month 36275 non-null int64 6 arrival_date 36275 non-null int64 7 repeated_guest 36275 non-null int64 8 no_of_previous_cancellations 36275 non-null int64 9 no_of_previous_bookings_not_canceled 36275 non-null int64 10 avg_price_per_room 36275 non-null float64 11 no_of_special_requests 36275 non-null int64 12 total_nights 36275 non-null int64 13 type_of_meal_plan_Meal Plan 2 36275 non-null int64 14 type_of_meal_plan_Meal Plan 3 36275 non-null int64 15 type_of_meal_plan_Not Selected 36275 non-null int64 16 room_type_reserved_Room_Type 2 36275 non-null int64 17 room_type_reserved_Room_Type 3 36275 non-null int64 18 room_type_reserved_Room_Type 4 36275 non-null int64 19 room_type_reserved_Room_Type 5 36275 non-null int64 20 room_type_reserved_Room_Type 6 36275 non-null int64 21 room_type_reserved_Room_Type 7 36275 non-null int64 22 market_segment_type_Complementary 36275 non-null int64 23 market_segment_type_Corporate 36275 non-null int64 24 market_segment_type_Offline 36275 non-null int64 25 market_segment_type_Online 36275 non-null int64 dtypes: float64(1), int64(25) memory usage: 7.2 MB
y_train.value_counts()
booking_status 0 17073 1 8319 Name: count, dtype: int64
# checking the shape of the the train and test data
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
# adding constant to the train data
X_train1 = sm.add_constant(X_train)
# adding constant to the test data
X_test1 = sm.add_constant(X_test)
print("{0:0.2f}% data is in training set".format((len(X_train1)/len(df.index)) * 100))
print("{0:0.2f}% data is in test set".format((len(X_test1)/len(df.index)) * 100))
70.00% data is in training set 30.00% data is in test set
print("Shape of Training set : ", X_train1.shape)
print()
print("Shape of test set : ", X_test1.shape)
print()
print("Percentage of classes in training set:")
print()
print(y_train.value_counts(normalize=True))
print()
print("Percentage of classes in test set:")
print()
print(y_test.value_counts(normalize=True))
Shape of Training set : (25392, 27) Shape of test set : (10883, 27) Percentage of classes in training set: booking_status 0 0.67 1 0.33 Name: proportion, dtype: float64 Percentage of classes in test set: booking_status 0 0.67 1 0.33 Name: proportion, dtype: float64
We had seen that around 67.2% of observations belongs to class 0 (Not Canceled) and 32.87% observations belongs to class 1 (Cancellation), and this is preserved in the train and test sets
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=labels, fmt="", cmap='nipy_spectral')
plt.ylabel("True label")
plt.xlabel("Predicted label")
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="", cmap='viridis')
plt.ylabel("True label")
plt.xlabel("Predicted label")
## Function to create confusion matrix
def make_confusion_matrix(model,y_actual,labels=[1, 0]):
'''
model : classifier to predict values of X
y_actual : ground truth
'''
y_predict = model.predict(X_test)
cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
columns = [i for i in ['Predicted - No','Predicted - Yes']])
group_counts = ["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (6,4))
sns.heatmap(df_cm, annot=labels,fmt='',cmap='viridis')
plt.ylabel('True label')
plt.xlabel('Predicted label')
## Function to calculate f1 score
def get_f1_score(model, predictors, target):
"""
model: classifier
predictors: independent variables
target: dependent variable
"""
prediction = model.predict(predictors)
return f1_score(target, prediction)
A Model can make a wrong predictions as:
Which case is more important?
Both are important:
If we anticipate a guest’s cancellation but they don’t cancel, we’ll reassign their room to another guest. Unfortunately, this means we won’t have a room available for them upon arrival, resulting in significant costs for the hotel (due to offering a complimentary upgraded room). Additionally, we risk losing repeat customers and receiving negative reviews.”
If we anticipate that a person won’t cancel their reservation, but they end up doing so, we not only miss out on the revenue from their booking but also incur costs for remarketing the room. Additionally, we’ll likely need to rebook the room at a discounted rate."
How to can you reduce these costs i.e maximize True Positives?
Maximimize your F1 score
fi_score is computed as
$$f1\_score = \frac{2 * Precision * Recall}{Precision + Recall}$$
The model_performance_classification_statsmodels function will be used to check the model performance of models.
Summary:
Accuracy: tells us how often the model makes correct predictions out of all predictions. It's like checking how many answers you got right on a test out of all the questions.
Precision: tells us how many of the predicted positive cases were actually positive. It's like asking, "When the model says something is true, how often is it right?" (TP/TP + FP)
Recall: tells us how many of the actual positive cases were predicted correctly by the model. It's like asking, "Out of all the true positive cases, how many did the model find?" (TP/TP+FN)
F1 Score: is a balance between precision and recall. It's useful when you care about both false positives and false negatives. It's like trying to find a sweet spot between "When the model says something is true, how often is it right?" and "Out of all the true positive cases, how many did the model find?"
Building the Logistic Regression model (with Sklearn library)
lg = LogisticRegression(solver="liblinear", random_state=1)
model = lg.fit(X_train1, y_train)
Model performance on training set
# predicting on training set
y_pred_train = lg.predict(X_train1)
print("Test set performance:")
print("Accuracy:", accuracy_score(y_train, y_pred_train))
print("Precision:", precision_score(y_train, y_pred_train))
print("Recall:", recall_score(y_train, y_pred_train))
print("F1:", f1_score(y_train, y_pred_train))
Test set performance: Accuracy: 0.8064744801512287 Precision: 0.7431795457791744 Recall: 0.6254357494891213 F1: 0.679242819843342
Performance on test set
# predicting on the test set
y_pred_test = lg.predict(X_test1)
print("Test set performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_test))
print("Precision:", precision_score(y_test, y_pred_test))
print("Recall:", recall_score(y_test, y_pred_test))
print("F1:", f1_score(y_test, y_pred_test))
Test set performance: Accuracy: 0.8014334282826426 Precision: 0.733932733932734 Recall: 0.6180594503645541 F1: 0.671030598264576
The training and testing precision rates are very close. Training is 74.3% and testing is 73.4%.
The f1_score on the train and test sets are comparable, 67.9% compared to 67.1%, which indicates that the model is showing generalized results.
Building the Logistic Regression model (with statsmodels library)
X_train1 = X_train1.astype(float) # Convert all columns to float
X_train1.dtypes
const float64 no_of_adults float64 no_of_children float64 required_car_parking_space float64 lead_time float64 arrival_year float64 arrival_month float64 arrival_date float64 repeated_guest float64 no_of_previous_cancellations float64 no_of_previous_bookings_not_canceled float64 avg_price_per_room float64 no_of_special_requests float64 total_nights float64 type_of_meal_plan_Meal Plan 2 float64 type_of_meal_plan_Meal Plan 3 float64 type_of_meal_plan_Not Selected float64 room_type_reserved_Room_Type 2 float64 room_type_reserved_Room_Type 3 float64 room_type_reserved_Room_Type 4 float64 room_type_reserved_Room_Type 5 float64 room_type_reserved_Room_Type 6 float64 room_type_reserved_Room_Type 7 float64 market_segment_type_Complementary float64 market_segment_type_Corporate float64 market_segment_type_Offline float64 market_segment_type_Online float64 dtype: object
y_train1 = y_train.astype(float) # Convert all columns to float
y_train1.dtypes
dtype('float64')
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
# fitting logistic regression model
logit = sm.Logit(y_train, X_train1.astype(float))
lg = logit.fit(disp=False)
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25365
Method: MLE Df Model: 26
Date: Fri, 31 May 2024 Pseudo R-squ.: 0.3316
Time: 23:59:04 Log-Likelihood: -10734.
converged: False LL-Null: -16060.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -893.4670 121.193 -7.372 0.000 -1131.002 -655.932
no_of_adults 0.0383 0.038 1.017 0.309 -0.036 0.112
no_of_children 0.0851 0.061 1.404 0.160 -0.034 0.204
required_car_parking_space -1.6099 0.137 -11.751 0.000 -1.878 -1.341
lead_time 0.0157 0.000 58.887 0.000 0.015 0.016
arrival_year 0.4414 0.060 7.350 0.000 0.324 0.559
arrival_month -0.0477 0.006 -7.349 0.000 -0.060 -0.035
arrival_date 0.0032 0.002 1.655 0.098 -0.001 0.007
repeated_guest -1.9232 0.766 -2.509 0.012 -3.425 -0.421
no_of_previous_cancellations 0.3475 0.101 3.430 0.001 0.149 0.546
no_of_previous_bookings_not_canceled -1.3496 0.883 -1.529 0.126 -3.080 0.380
avg_price_per_room 0.0183 0.001 24.736 0.000 0.017 0.020
no_of_special_requests -1.4886 0.030 -48.930 0.000 -1.548 -1.429
total_nights 0.0695 0.010 7.299 0.000 0.051 0.088
type_of_meal_plan_Meal Plan 2 0.1823 0.067 2.728 0.006 0.051 0.313
type_of_meal_plan_Meal Plan 3 12.9000 425.208 0.030 0.976 -820.493 846.293
type_of_meal_plan_Not Selected 0.1967 0.053 3.691 0.000 0.092 0.301
room_type_reserved_Room_Type 2 -0.4199 0.133 -3.150 0.002 -0.681 -0.159
room_type_reserved_Room_Type 3 1.2239 1.884 0.650 0.516 -2.469 4.917
room_type_reserved_Room_Type 4 -0.2730 0.053 -5.120 0.000 -0.378 -0.168
room_type_reserved_Room_Type 5 -0.6731 0.215 -3.135 0.002 -1.094 -0.252
room_type_reserved_Room_Type 6 -0.8439 0.153 -5.532 0.000 -1.143 -0.545
room_type_reserved_Room_Type 7 -1.3645 0.297 -4.594 0.000 -1.947 -0.782
market_segment_type_Complementary -18.9188 554.615 -0.034 0.973 -1105.944 1068.106
market_segment_type_Corporate -0.8734 0.276 -3.170 0.002 -1.413 -0.333
market_segment_type_Offline -1.7715 0.263 -6.723 0.000 -2.288 -1.255
market_segment_type_Online 0.0072 0.261 0.027 0.978 -0.504 0.518
========================================================================================================
There are 25392 observations.
There are 5 P>|z| greater than 0.05. These could be considered significant.
Market Segment complementarty has a -18.9188 coeefficient and type_of_meal_plan_3 os 12.90.
print("Training performance:")
model_performance_classification_statsmodels(lg, X_train1, y_train)
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81 | 0.63 | 0.74 | 0.68 |
In order to make statistical inferences from a logistic regression model, it is important to ensure that there is no multicollinearity present in the data.
VIF standards:
* If VIF is between 1 and 5, then there is low multicollinearity.
* If VIF is between 5 and 10, we say there is moderate multicollinearity.
* If VIF is exceeding 10, it shows signs of high multicollinearity.
# defining a function to check VIF
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))
]
return vif
checking_vif(X_train1).sort_values(by='VIF', ascending=False)
| feature | VIF | |
|---|---|---|
| 0 | const | 39547263.42 |
| 26 | market_segment_type_Online | 69.47 |
| 25 | market_segment_type_Offline | 62.51 |
| 24 | market_segment_type_Corporate | 16.63 |
| 23 | market_segment_type_Complementary | 4.35 |
| 11 | avg_price_per_room | 2.03 |
| 2 | no_of_children | 2.01 |
| 21 | room_type_reserved_Room_Type 6 | 1.99 |
| 8 | repeated_guest | 1.75 |
| 10 | no_of_previous_bookings_not_canceled | 1.57 |
| 5 | arrival_year | 1.43 |
| 4 | lead_time | 1.40 |
| 19 | room_type_reserved_Room_Type 4 | 1.36 |
| 1 | no_of_adults | 1.34 |
| 9 | no_of_previous_cancellations | 1.32 |
| 16 | type_of_meal_plan_Not Selected | 1.28 |
| 6 | arrival_month | 1.28 |
| 14 | type_of_meal_plan_Meal Plan 2 | 1.26 |
| 12 | no_of_special_requests | 1.25 |
| 17 | room_type_reserved_Room_Type 2 | 1.09 |
| 13 | total_nights | 1.09 |
| 22 | room_type_reserved_Room_Type 7 | 1.09 |
| 3 | required_car_parking_space | 1.03 |
| 20 | room_type_reserved_Room_Type 5 | 1.03 |
| 15 | type_of_meal_plan_Meal Plan 3 | 1.01 |
| 7 | arrival_date | 1.01 |
| 18 | room_type_reserved_Room_Type 3 | 1.00 |
Observations:
Some of the market segment dummy variables are showing higher than 5 the rest of the variables are all below 5.
Appears there is no multicollinearity so our assumption is satisfied.
Need to check p-values of predictor variables to check for significance.
Need to check if dropping any variables cause the p-value to change
# running a loop to drop variables with high p-value
# initial list of columns
cols = X_train1.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = X_train1[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'repeated_guest', 'no_of_previous_cancellations', 'avg_price_per_room', 'no_of_special_requests', 'total_nights', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Corporate', 'market_segment_type_Offline']
X_train2 = X_train1[selected_features]
logit2 = sm.Logit(y_train, X_train2.astype(float))
lg2 = logit2.fit(disp=False)
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 25392
Model: Logit Df Residuals: 25373
Method: MLE Df Model: 18
Date: Fri, 31 May 2024 Pseudo R-squ.: 0.3306
Time: 23:59:07 Log-Likelihood: -10751.
converged: True LL-Null: -16060.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -875.3634 120.780 -7.248 0.000 -1112.088 -638.639
required_car_parking_space -1.6098 0.137 -11.762 0.000 -1.878 -1.342
lead_time 0.0158 0.000 59.844 0.000 0.015 0.016
arrival_year 0.4325 0.060 7.225 0.000 0.315 0.550
arrival_month -0.0494 0.006 -7.644 0.000 -0.062 -0.037
repeated_guest -3.0704 0.600 -5.117 0.000 -4.246 -1.894
no_of_previous_cancellations 0.2899 0.077 3.743 0.000 0.138 0.442
avg_price_per_room 0.0189 0.001 26.505 0.000 0.017 0.020
no_of_special_requests -1.4831 0.030 -49.225 0.000 -1.542 -1.424
total_nights 0.0713 0.009 7.514 0.000 0.053 0.090
type_of_meal_plan_Meal Plan 2 0.1749 0.067 2.619 0.009 0.044 0.306
type_of_meal_plan_Not Selected 0.2077 0.053 3.936 0.000 0.104 0.311
room_type_reserved_Room_Type 2 -0.3760 0.129 -2.912 0.004 -0.629 -0.123
room_type_reserved_Room_Type 4 -0.2716 0.052 -5.265 0.000 -0.373 -0.170
room_type_reserved_Room_Type 5 -0.6792 0.214 -3.175 0.002 -1.098 -0.260
room_type_reserved_Room_Type 6 -0.7378 0.120 -6.161 0.000 -0.972 -0.503
room_type_reserved_Room_Type 7 -1.3168 0.291 -4.522 0.000 -1.887 -0.746
market_segment_type_Corporate -0.8955 0.103 -8.684 0.000 -1.098 -0.693
market_segment_type_Offline -1.7803 0.052 -34.463 0.000 -1.882 -1.679
==================================================================================================
No p-value is greater than 0.05.
Coefficients
Positive - lead_time, arrival_year, no_of_previous_cancellations, avg_price_per_room, total_nights, type_of_meal_plan_Not Selected,type_of_meal_plan_Meal Plan 2
Negative - required_car_parking_space, arrival_month, repeated_guest, no_of_special_requests, room_type_reserved_Room Type 2, room_type_reserved_Room Type 4, room_type_reserved_Room Type 5, room_type_reserved_Room Type 6, room_type_reserved_Room Type 7, market_segment_type_Corporate,market_segment_type_Offline
Positive - means an increase in the variable will lead to an increase in the chance of a booking being canceled.
Negative - means a decrease in the variable will lead to an decrease in the chance of a booking being canceled.
Coefficients needs to be converted to odds
In logistic regression, the coefficients represent the logarithm of the odds. To obtain the actual odds, we need to take the exponential of these coefficients.
odds = exp(b)
Percent change in odds is odds=(exp(b)-1)*100
Since all variables in lg2 have a p-value less than 0.05 we can consider that our final model.
# The purpose of converting the coefficients into a probability (unlike linear regression), it's because the logistic regression model estimates log(odds) as a linear function of the predictor variables.
#Since the coefficients in the logistic regression model represent the change in log(odds), we need to exponentiate them to interpret them in terms of odds.
# converting coefficients to odds
odds = np.exp(lg2.params)
# finding the percentage change
perc_change_odds = (np.exp(lg2.params) - 1) * 100
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odds": perc_change_odds}, index=X_train2.columns).sort_values(by='Change_odds')
| Odds | Change_odds | |
|---|---|---|
| const | 0.00 | -100.00 |
| repeated_guest | 0.05 | -95.36 |
| market_segment_type_Offline | 0.17 | -83.14 |
| required_car_parking_space | 0.20 | -80.01 |
| no_of_special_requests | 0.23 | -77.31 |
| room_type_reserved_Room_Type 7 | 0.27 | -73.20 |
| market_segment_type_Corporate | 0.41 | -59.16 |
| room_type_reserved_Room_Type 6 | 0.48 | -52.18 |
| room_type_reserved_Room_Type 5 | 0.51 | -49.30 |
| room_type_reserved_Room_Type 2 | 0.69 | -31.34 |
| room_type_reserved_Room_Type 4 | 0.76 | -23.78 |
| arrival_month | 0.95 | -4.82 |
| lead_time | 1.02 | 1.59 |
| avg_price_per_room | 1.02 | 1.91 |
| total_nights | 1.07 | 7.39 |
| type_of_meal_plan_Meal Plan 2 | 1.19 | 19.12 |
| type_of_meal_plan_Not Selected | 1.23 | 23.08 |
| no_of_previous_cancellations | 1.34 | 33.62 |
| arrival_year | 1.54 | 54.11 |
Top 5 Coefficients that will cause a negative change:
Top 5 Coefficients that will cause a positive change:
Model Performance on final training set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train)
True Positive - 15235
True Negative - 5265
False Positive - 1838
False Negative - 1838
log_reg_model_train_perf = model_performance_classification_statsmodels(lg2, X_train2, y_train)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81 | 0.63 | 0.74 | 0.68 |
This is showing a F1 score of 0.68.
X_test2 = X_test1[list(X_train2.columns)]
vif_series = pd.Series(
[variance_inflation_factor(X_train2.values, i) for i in range(X_train2.shape[1])],
index=X_train2.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: const 39098190.79 required_car_parking_space 1.03 lead_time 1.36 arrival_year 1.42 arrival_month 1.26 repeated_guest 1.49 no_of_previous_cancellations 1.18 avg_price_per_room 1.62 no_of_special_requests 1.22 total_nights 1.08 type_of_meal_plan_Meal Plan 2 1.25 type_of_meal_plan_Not Selected 1.24 room_type_reserved_Room_Type 2 1.03 room_type_reserved_Room_Type 4 1.27 room_type_reserved_Room_Type 5 1.02 room_type_reserved_Room_Type 6 1.25 room_type_reserved_Room_Type 7 1.03 market_segment_type_Corporate 1.41 market_segment_type_Offline 1.56 dtype: float64
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test)
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg2, X_test2, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80 | 0.63 | 0.74 | 0.68 |
ROC Curve:
What It Is: The ROC curve is a graphical representation that shows the performance of a binary classification model across different threshold settings.
Simple Explanation: Imagine you have a model that predicts whether an email is spam or not. The ROC curve tells you how well your model can distinguish between spam and non-spam emails as you adjust the threshold for classifying an email as spam.
X-axis: False Positive Rate (FPR) - It represents the ratio of false positive predictions (predicting spam when it's not) to all actual negative instances.
Y-axis: True Positive Rate (TPR) - It represents the ratio of true positive predictions (correctly predicting spam) to all actual positive instances.
Plotting Points: The ROC curve is generated by plotting TPR against FPR for various threshold settings.
Interpretation: ROC-AUC ranges from 0 to 1, where 1 represents a perfect model (all true positives, no false positives), and 0.5 represents a random model (no discrimination between classes).
Comparing Models: You can use ROC-AUC to compare different models. The model with a higher ROC-AUC value is generally considered to be better at distinguishing between the classes.
ROC-AUC (Training(
logit_roc_auc_train = roc_auc_score(y_train, lg2.predict(X_train2))
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
<Figure size 700x500 with 0 Axes>
[<matplotlib.lines.Line2D at 0x7cf381740760>]
[<matplotlib.lines.Line2D at 0x7cf381740670>]
(0.0, 1.0)
(0.0, 1.05)
Text(0.5, 0, 'False Positive Rate')
Text(0, 0.5, 'True Positive Rate')
Text(0.5, 1.0, 'Receiver operating characteristic')
<matplotlib.legend.Legend at 0x7cf38131da80>
Based on the logistic regression area of 0.86 it appears the model is performing well.
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg2.predict(X_train2))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.34049961761164615
# creating confusion matrix
confusion_matrix_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.79 | 0.76 | 0.65 | 0.70 |
Accuracy - decreased by 0.01
Recall - increased by 0.13
Precision - decreased by 0.09
F1 score - increased by .02
Since the recall and F1 score increased this model with this threshold is more useful for INN Hotel's intended use case.
Model performance on test set
logit_roc_auc_test = roc_auc_score(y_test, lg2.predict(X_test2))
fpr, tpr, thresholds = roc_curve(y_test, lg2.predict(X_test2))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_test)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
<Figure size 700x500 with 0 Axes>
[<matplotlib.lines.Line2D at 0x7cf37a441570>]
[<matplotlib.lines.Line2D at 0x7cf37a443c70>]
(0.0, 1.0)
(0.0, 1.05)
Text(0.5, 0, 'False Positive Rate')
Text(0, 0.5, 'True Positive Rate')
Text(0.5, 1.0, 'Receiver operating characteristic')
<matplotlib.legend.Legend at 0x7cf37bfa90f0>
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.78 | 0.76 | 0.64 | 0.69 |
The test performs almost the same as the training.
Accuracy - same
Recall - same
Precision - decrease 0.01
F1 score - decrease of 0.01
Optimal threshold using the Precision-Recall curve
y_scores = lg2.predict(X_train2)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
<Figure size 1000x700 with 0 Axes>
At the threshold of ~ 0.42, we get balanced recall and precision.
print(tre)
[4.36283975e-07 1.76602177e-06 3.18008213e-06 ... 9.95935711e-01 9.97452510e-01 9.98159837e-01]
# setting the threshold
optimal_threshold_curve = 0.42
Model Performance on Training Set
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_train2, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_train2, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80 | 0.70 | 0.70 | 0.70 |
Model is performing well on the training set.
Threshold went from 0.34 to 0.42.
Accuracy - increased by 0.01
Recall - decreased by 0.06
Precision - increased by 0.05
F1 Score - stayed the same.
Although accuracy and precision increased, recall decreased and F1 score stayed the same.
Original threshold is better on recall.
# creating confusion matrix
confusion_matrix_statsmodels(lg2, X_test2, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg2, X_test2, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.80 | 0.69 | 0.69 | 0.69 |
Model is performing well on the testing set.
Threshold went from 0.34 to 0.42.
Accuracy - increased by 0.02
Recall - decreased by 0.07
Precision - increased by 0.05
F1 Score - stayed the same.
Although accuracy and precision increased, recall decreased and F1 score stayed the same.
Original threshold is better on recall.
Match almost perfectly to the training model.
Logistic Regression model summary
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression-sklearn",
"Logistic Regression-0.34 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression-sklearn | Logistic Regression-0.34 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.81 | 0.79 | 0.80 |
| Recall | 0.63 | 0.76 | 0.70 |
| Precision | 0.74 | 0.65 | 0.70 |
| F1 | 0.68 | 0.70 | 0.70 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression-sklearn",
"Logistic Regression-0.34 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression-sklearn | Logistic Regression-0.34 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.80 | 0.78 | 0.80 |
| Recall | 0.63 | 0.76 | 0.69 |
| Precision | 0.74 | 0.64 | 0.69 |
| F1 | 0.68 | 0.69 | 0.69 |
Conclusions
Top 5 Coefficients that will cause a negative change:
Top 5 Coefficients that will cause a positive change:
criterion: The function to measure the quality of a split ("gini" or "entropy").
Entropy, based on the concept from information theory, measures the amount of disorder or unpredictability in the data at a node. Entropy ranges from 0 (pure node) to 1 (maximally mixed node with equal distribution of classes).
Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. Range: The Gini impurity ranges from 0 (pure node) to 0.5 (evenly mixed classes in the node).
Preference: Gini impurity is typically preferred for its computational efficiency, but entropy is sometimes chosen for its stronger theoretical foundation from information theory.
splitter: The strategy used to choose the split at each node ("best" or "random").
max_depth: The maximum depth of the tree (None for no limit).
min_samples_split: The minimum number of samples required to split an internal node (int or float).
min_samples_leaf: The minimum number of samples required to be at a leaf node (int or float).
min_weight_fraction_leaf: The minimum weighted fraction of the sum total of weights required to be at a leaf node.
max_features: The number of features to consider when looking for the best split (int, float, string, or None).
random_state: Controls the randomness of the estimator (int, RandomState instance, or None).
max_leaf_nodes: Grow a tree with the maximum number of leaf nodes (None for unlimited).
min_impurity_decrease: A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
class_weight: Weights associated with classes (dict, list of dicts, "balanced", or None).
ccp_alpha: Complexity parameter used for Minimal Cost-Complexity Pruning (non-negative float).
# Create a DecisionTreeClassifier with all parameters specified
clf_example = DecisionTreeClassifier(
criterion='entropy', # Measure quality of split using 'entropy' 'Gini' (Measure of randomness or disorder in a set of data)
splitter='random', # Use random best split
max_depth=3, # Maximum depth of the tree is 5
min_samples_split=4, # Minimum 4 samples required to split an internal node
min_samples_leaf=3, # Minimum 2 samples required to be at a leaf node
min_weight_fraction_leaf=0.01, # Minimum weighted fraction of sum total of weights required at a leaf node
max_features='sqrt', # Number of features to consider when looking for the best split is the square root of total features
random_state=44, # Control randomness of the estimator
max_leaf_nodes=15, # Maximum number of leaf nodes is 10
min_impurity_decrease=0.01, # A node will be split if this split induces a decrease in impurity greater than or equal to this value
class_weight='balanced', # Adjust weights inversely proportional to class frequencies in the input data
ccp_alpha=0.01 # Complexity parameter used for Minimal Cost-Complexity Pruning
)
df3["booking_status"].value_counts()
booking_status Not_Canceled 24390 Canceled 11885 Name: count, dtype: int64
df3["booking_status"] = df3["booking_status"].apply(lambda x: 1 if x == "Canceled" else 0)
#resplit data for the decision tree model
X = df3.drop(["booking_status",], axis=1)
Y = df3["booking_status"]
# Identify object-type columns
object_cols = X.select_dtypes(include=['object','category']).columns
# Convert object-type columns to dummy variables
X = pd.get_dummies(X, columns=object_cols, dtype=int, drop_first=True) # Drop the first category to avoid multicollinearity
# Ensure te output is integer (numeric 0 and 1) instead of Boolean
# Splitting data in train and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.30, random_state=1
)
Y.unique()
array([0, 1])
X_train.head()
| no_of_adults | no_of_children | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | total_nights | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13662 | 1 | 0 | 0 | 163 | 2018 | 10 | 15 | 0 | 0 | 0 | 115.00 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 26641 | 2 | 0 | 0 | 113 | 2018 | 3 | 31 | 0 | 0 | 0 | 78.15 | 1 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 17835 | 2 | 0 | 0 | 359 | 2018 | 10 | 14 | 0 | 0 | 0 | 78.00 | 1 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 21485 | 2 | 0 | 0 | 136 | 2018 | 6 | 29 | 0 | 0 | 0 | 85.50 | 0 | 3 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 5670 | 2 | 0 | 0 | 21 | 2018 | 8 | 15 | 0 | 0 | 0 | 151.00 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
We will construct our model using the DecisionTreeClassifier function. By default, it employs the ‘gini’ criterion to determine how to split the data at each node. Alternatively, you can choose the ‘entropy’ criterion for splitting
X_train = X_train.astype(float) # Convert all columns to float
X_train.dtypes
no_of_adults float64 no_of_children float64 required_car_parking_space float64 lead_time float64 arrival_year float64 arrival_month float64 arrival_date float64 repeated_guest float64 no_of_previous_cancellations float64 no_of_previous_bookings_not_canceled float64 avg_price_per_room float64 no_of_special_requests float64 total_nights float64 type_of_meal_plan_Meal Plan 2 float64 type_of_meal_plan_Meal Plan 3 float64 type_of_meal_plan_Not Selected float64 room_type_reserved_Room_Type 2 float64 room_type_reserved_Room_Type 3 float64 room_type_reserved_Room_Type 4 float64 room_type_reserved_Room_Type 5 float64 room_type_reserved_Room_Type 6 float64 room_type_reserved_Room_Type 7 float64 market_segment_type_Complementary float64 market_segment_type_Corporate float64 market_segment_type_Offline float64 market_segment_type_Online float64 dtype: object
# checking the shape of the the train and test data
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 25392 Number of rows in test data = 10883
adding constant not needed for decisiontrees
sm.add_constant
print("{0:0.2f}% data is in training set".format((len(X_train)/len(df.index)) * 100))
print("{0:0.2f}% data is in test set".format((len(X_test)/len(df.index)) * 100))
70.00% data is in training set 30.00% data is in test set
#confirm percentage of each class in both training and test datasets
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print(' ')
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: booking_status 0 0.67 1 0.33 Name: proportion, dtype: float64 Percentage of classes in test set: booking_status 0 0.68 1 0.32 Name: proportion, dtype: float64
model = DecisionTreeClassifier(random_state=1)
model.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(random_state=1)
Model evaluation criterion
Model can make wrong predictions as:
Which case is more important?
How to reduce the losses?
The company would want the recall to be maximized, the greater the recall score the higher the chances of minimizing the False Negatives.
# defining a function to compute different metrics to check performance of a classification model built using sklearn
def model_performance_classification_sklearn(model, predictors, target):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
"""
# predicting using the independent variables
pred = model.predict(predictors)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
confusion_matrix_sklearn(model, X_train, y_train)
decision_tree_perf_train_without = model_performance_classification_sklearn(
model, X_train, y_train
)
decision_tree_perf_train_without
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99 | 0.99 | 1.00 | 0.99 |
Model is showing an F1 score of 99. It is only misclassifying 147 bookings. However there is probably significant overfitting in the training data.
confusion_matrix_sklearn(model, X_test, y_test)
decision_tree_perf_test_without = model_performance_classification_sklearn(
model, X_test, y_test
)
decision_tree_perf_test_without
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.87 | 0.80 | 0.79 | 0.80 |
There is a huge difference between the training set and test set. That means there is overfitting.
If the frequency of class A is 10% and the frequency of class B is 90%, then class B will become the dominant class and the decision tree will become biased toward the dominant classes
In this case, we will set class_weight = "balanced", which will automatically adjust the weights to be inversely proportional to the class frequencies in the input data
class_weight is a hyperparameter for the decision tree classifier
#build the decision tree model
decisiontree = DecisionTreeClassifier(random_state=1, class_weight="balanced")
#fit the model to the training set
decisiontree.fit(X_train, y_train)
#create a confusion matrix
confusion_matrix_sklearn(decisiontree, X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeClassifier(class_weight='balanced', random_state=1)
decision_tree_perf_train = model_performance_classification_sklearn(
decisiontree, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.99 | 1.00 | 0.98 | 0.99 |
Model is only misclassifying 156 bookings. But there is most likely overfitting in the training data.
#create a confusion matrix for the test set
confusion_matrix_sklearn(decisiontree, X_test, y_test)
decision_tree_perf_test = model_performance_classification_sklearn(
decisiontree, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.86 | 0.81 | 0.78 | 0.79 |
There is a huge difference in the performance of the model on the training set and the test set, this means the model is overfitting.
## creating a list of column names
feature_names = X_train.columns.to_list()
feature_names
['no_of_adults', 'no_of_children', 'required_car_parking_space', 'lead_time', 'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest', 'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled', 'avg_price_per_room', 'no_of_special_requests', 'total_nights', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Meal Plan 3', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 3', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
# Text report showing the rules of a decision tree -
print(tree.export_text(decisiontree, feature_names=feature_names, show_weights=True))
|--- lead_time <= 151.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 90.50 | | | | |--- total_nights <= 5.50 | | | | | |--- avg_price_per_room <= 201.50 | | | | | | |--- lead_time <= 74.50 | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | |--- lead_time <= 59.50 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- lead_time > 59.50 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- weights: [19.38, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | |--- avg_price_per_room <= 61.00 | | | | | | | | | | |--- avg_price_per_room <= 59.75 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 59.75 | | | | | | | | | | | |--- weights: [0.00, 50.10] class: 1 | | | | | | | | | |--- avg_price_per_room > 61.00 | | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- arrival_month > 5.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | |--- weights: [132.71, 0.00] class: 0 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- avg_price_per_room <= 50.00 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [14.17, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 50.00 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | |--- lead_time > 74.50 | | | | | | | |--- lead_time <= 78.50 | | | | | | | | |--- avg_price_per_room <= 79.78 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [12.67, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 79.78 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- weights: [0.00, 28.84] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- lead_time > 78.50 | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | | | |--- weights: [82.01, 0.00] class: 0 | | | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- lead_time <= 86.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 86.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- avg_price_per_room > 201.50 | | | | | | |--- arrival_date <= 28.00 | | | | | | | |--- weights: [0.00, 25.81] class: 1 | | | | | | |--- arrival_date > 28.00 | | | | | | | |--- avg_price_per_room <= 240.38 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 240.38 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | |--- total_nights > 5.50 | | | | | |--- avg_price_per_room <= 92.80 | | | | | | |--- arrival_date <= 22.50 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 72.50 | | | | | | | | | |--- lead_time <= 33.00 | | | | | | | | | | |--- arrival_date <= 16.00 | | | | | | | | | | | |--- weights: [18.64, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 16.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 33.00 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time > 72.50 | | | | | | | | | |--- weights: [14.91, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- arrival_date > 22.50 | | | | | | | |--- weights: [23.86, 0.00] class: 0 | | | | | |--- avg_price_per_room > 92.80 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- arrival_date <= 21.00 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | |--- total_nights <= 13.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_nights > 13.50 | | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | |--- weights: [0.00, 74.39] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- total_nights <= 6.50 | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | |--- total_nights > 6.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | |--- arrival_date > 21.00 | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- lead_time <= 117.50 | | | | | |--- avg_price_per_room <= 93.58 | | | | | | |--- avg_price_per_room <= 75.07 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- lead_time <= 98.00 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | | | |--- lead_time > 98.00 | | | | | | | | | | |--- avg_price_per_room <= 63.25 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 63.25 | | | | | | | | | | | |--- weights: [0.00, 13.66] class: 1 | | | | | | |--- avg_price_per_room > 75.07 | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 80.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 80.50 | | | | | | | | | | | |--- weights: [17.15, 0.00] class: 0 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- weights: [37.28, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- arrival_month > 3.50 | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- weights: [0.00, 16.70] class: 1 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- total_nights <= 4.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- total_nights > 4.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 86.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 86.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | |--- avg_price_per_room > 93.58 | | | | | | |--- arrival_date <= 11.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 108.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | |--- lead_time > 108.50 | | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | | |--- weights: [8.95, 1.52] class: 0 | | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- avg_price_per_room <= 116.75 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 116.75 | | | | | | | | | | |--- weights: [1.49, 1.52] class: 1 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- total_nights <= 2.00 | | | | | | | | | | |--- weights: [0.00, 12.14] class: 1 | | | | | | | | | |--- total_nights > 2.00 | | | | | | | | | | |--- lead_time <= 112.00 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 112.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- arrival_date > 11.50 | | | | | | | |--- avg_price_per_room <= 102.09 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- avg_price_per_room <= 95.44 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 95.44 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- avg_price_per_room > 102.09 | | | | | | | | |--- avg_price_per_room <= 109.50 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 108.50 | | | | | | | | | | | |--- weights: [0.00, 16.70] class: 1 | | | | | | | | | | |--- avg_price_per_room > 108.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | | | |--- weights: [31.31, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 109.50 | | | | | | | | | |--- avg_price_per_room <= 124.25 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- weights: [0.00, 71.35] class: 1 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 124.25 | | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | |--- lead_time > 117.50 | | | | | |--- no_of_adults <= 1.50 | | | | | | |--- avg_price_per_room <= 122.00 | | | | | | | |--- weights: [105.12, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 122.00 | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- no_of_adults > 1.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- arrival_month <= 5.00 | | | | | | | | | | |--- weights: [24.60, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 5.00 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- weights: [14.17, 0.00] class: 0 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | |--- arrival_date > 7.50 | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 74.12 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 74.12 | | | | | | | | | | | |--- weights: [20.13, 0.00] class: 0 | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 57.25 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- avg_price_per_room > 57.25 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [48.46, 0.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 13.50 | | | | |--- avg_price_per_room <= 99.44 | | | | | |--- arrival_month <= 1.50 | | | | | | |--- weights: [92.45, 0.00] class: 0 | | | | | |--- arrival_month > 1.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- total_nights <= 2.50 | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | | |--- weights: [28.33, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 74.40 | | | | | | | | | | | |--- weights: [17.89, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 74.40 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | |--- avg_price_per_room <= 68.38 | | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 68.38 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | |--- total_nights > 2.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 85.50 | | | | | | | | | | |--- no_of_adults <= 0.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 0.50 | | | | | | | | | | | |--- weights: [0.00, 22.77] class: 1 | | | | | | | | | |--- avg_price_per_room > 85.50 | | | | | | | | | | |--- avg_price_per_room <= 89.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 89.00 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- total_nights <= 5.50 | | | | | | | | |--- avg_price_per_room <= 94.66 | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | |--- lead_time <= 11.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 11.00 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 90.17 | | | | | | | | | | | |--- weights: [116.31, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 90.17 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- avg_price_per_room > 94.66 | | | | | | | | | |--- avg_price_per_room <= 95.10 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 95.10 | | | | | | | | | | |--- weights: [14.91, 0.00] class: 0 | | | | | | | |--- total_nights > 5.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- lead_time <= 3.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- lead_time > 3.50 | | | | | | | | | | |--- weights: [0.00, 9.11] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | |--- avg_price_per_room > 99.44 | | | | | |--- lead_time <= 3.50 | | | | | | |--- avg_price_per_room <= 202.67 | | | | | | | |--- total_nights <= 6.50 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 163.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 163.00 | | | | | | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | | |--- avg_price_per_room <= 132.39 | | | | | | | | | | | |--- weights: [60.39, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 132.39 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- total_nights > 6.50 | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | |--- avg_price_per_room > 202.67 | | | | | | | |--- arrival_month <= 11.00 | | | | | | | | |--- weights: [0.00, 22.77] class: 1 | | | | | | | |--- arrival_month > 11.00 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | |--- lead_time > 3.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- avg_price_per_room <= 119.25 | | | | | | | | |--- avg_price_per_room <= 118.50 | | | | | | | | | |--- lead_time <= 12.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- lead_time > 12.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 118.50 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 4.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- lead_time > 4.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 119.25 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | |--- weights: [21.62, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- arrival_date <= 1.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 1.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- avg_price_per_room <= 208.67 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 208.67 | | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [15.66, 0.00] class: 0 | | | |--- lead_time > 13.50 | | | | |--- required_car_parking_space <= 0.50 | | | | | |--- avg_price_per_room <= 71.92 | | | | | | |--- avg_price_per_room <= 59.43 | | | | | | | |--- lead_time <= 84.50 | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | |--- lead_time <= 51.50 | | | | | | | | | | |--- avg_price_per_room <= 21.67 | | | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 21.67 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 51.50 | | | | | | | | | | |--- weights: [12.67, 0.00] class: 0 | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | |--- weights: [23.11, 0.00] class: 0 | | | | | | | |--- lead_time > 84.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_date <= 27.00 | | | | | | | | | | |--- lead_time <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 27.00 | | | | | | | | | | |--- total_nights <= 2.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- total_nights > 2.00 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 59.43 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- total_nights <= 4.50 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 69.06 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 69.06 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- weights: [14.91, 0.00] class: 0 | | | | | | | | |--- total_nights > 4.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_date <= 4.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 4.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 71.34 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 68.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 68.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- lead_time <= 102.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- lead_time > 102.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 71.34 | | | | | | | | | |--- weights: [11.18, 0.00] class: 0 | | | | | |--- avg_price_per_room > 71.92 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- lead_time <= 65.50 | | | | | | | | |--- avg_price_per_room <= 120.45 | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- avg_price_per_room > 120.45 | | | | | | | | | |--- arrival_date <= 17.50 | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 17.50 | | | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- lead_time > 65.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 75.75 | | | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | | | | | | |--- avg_price_per_room > 75.75 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 60.72] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- avg_price_per_room <= 104.31 | | | | | | | | |--- lead_time <= 25.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [16.40, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [23.11, 0.00] class: 0 | | | | | | | | |--- lead_time > 25.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | |--- avg_price_per_room > 104.31 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | |--- avg_price_per_room <= 195.30 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | | | |--- avg_price_per_room > 195.30 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- avg_price_per_room <= 168.06 | | | | | | | | | | |--- lead_time <= 22.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time > 22.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- avg_price_per_room > 168.06 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | |--- required_car_parking_space > 0.50 | | | | | |--- total_nights <= 11.00 | | | | | | |--- weights: [48.46, 0.00] class: 0 | | | | | |--- total_nights > 11.00 | | | | | | |--- weights: [0.00, 1.52] class: 1 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- lead_time <= 102.50 | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | |--- total_nights <= 15.00 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- lead_time <= 91.50 | | | | | | | | | |--- avg_price_per_room <= 129.50 | | | | | | | | | | |--- weights: [632.23, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 129.50 | | | | | | | | | | |--- avg_price_per_room <= 131.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 131.50 | | | | | | | | | | | |--- weights: [20.13, 0.00] class: 0 | | | | | | | | |--- lead_time > 91.50 | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- weights: [26.84, 0.00] class: 0 | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | |--- arrival_date <= 16.50 | | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | | |--- arrival_date > 16.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- total_nights <= 4.50 | | | | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | | | | | |--- total_nights > 4.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | |--- total_nights > 15.00 | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | |--- lead_time <= 63.00 | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | |--- weights: [13.42, 0.00] class: 0 | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [1.49, 1.52] class: 1 | | | | | | |--- lead_time > 63.00 | | | | | | | |--- weights: [0.00, 7.59] class: 1 | | | | |--- lead_time > 102.50 | | | | | |--- lead_time <= 104.50 | | | | | | |--- lead_time <= 103.50 | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | |--- no_of_children > 0.50 | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- lead_time > 103.50 | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | |--- lead_time > 104.50 | | | | | | |--- lead_time <= 150.50 | | | | | | | |--- avg_price_per_room <= 141.75 | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | |--- avg_price_per_room <= 81.00 | | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 81.00 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | |--- weights: [20.13, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 141.75 | | | | | | | | |--- total_nights <= 5.00 | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | |--- total_nights > 5.00 | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | |--- lead_time > 150.50 | | | | | | | |--- total_nights <= 2.50 | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | |--- total_nights > 2.50 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | |--- market_segment_type_Online > 0.50 | | | | |--- lead_time <= 8.50 | | | | | |--- lead_time <= 4.50 | | | | | | |--- total_nights <= 14.00 | | | | | | | |--- avg_price_per_room <= 219.86 | | | | | | | | |--- total_nights <= 6.50 | | | | | | | | | |--- avg_price_per_room <= 157.64 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 157.64 | | | | | | | | | | |--- avg_price_per_room <= 158.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- avg_price_per_room > 158.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- total_nights > 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- arrival_month <= 4.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 4.00 | | | | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | |--- avg_price_per_room > 219.86 | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- total_nights > 14.00 | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- lead_time > 4.50 | | | | | | |--- arrival_date <= 13.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 88.39 | | | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | | | |--- weights: [11.93, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 88.39 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- avg_price_per_room <= 94.48 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- avg_price_per_room > 94.48 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- avg_price_per_room <= 157.12 | | | | | | | | | |--- weights: [32.06, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 157.12 | | | | | | | | | |--- total_nights <= 3.00 | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | |--- total_nights > 3.00 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- arrival_date > 13.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- avg_price_per_room <= 139.57 | | | | | | | | | |--- avg_price_per_room <= 101.59 | | | | | | | | | | |--- avg_price_per_room <= 101.22 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 101.22 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- avg_price_per_room > 101.59 | | | | | | | | | | |--- weights: [57.41, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 139.57 | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | |--- avg_price_per_room <= 140.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 140.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 126.33 | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | |--- weights: [17.89, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- weights: [12.67, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 126.33 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 128.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 128.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | |--- lead_time > 8.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 118.55 | | | | | | | |--- lead_time <= 61.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- total_nights <= 6.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [65.61, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- total_nights > 6.50 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- total_nights <= 12.50 | | | | | | | | | | |--- weights: [126.74, 0.00] class: 0 | | | | | | | | | |--- total_nights > 12.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | |--- lead_time > 61.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 71.93 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 71.93 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | |--- avg_price_per_room > 118.55 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | |--- total_nights <= 9.50 | | | | | | | | | | |--- avg_price_per_room <= 177.15 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- avg_price_per_room > 177.15 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- total_nights > 9.50 | | | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | |--- arrival_date <= 27.50 | | | | | | | | | | |--- avg_price_per_room <= 121.20 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 121.20 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- arrival_date > 27.50 | | | | | | | | | | |--- lead_time <= 55.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- lead_time > 55.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- weights: [37.28, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 119.20 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_per_room > 119.20 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- lead_time <= 100.00 | | | | | | | | | | | |--- weights: [49.95, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 100.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- total_nights <= 10.50 | | | | | | | |--- weights: [134.20, 0.00] class: 0 | | | | | | |--- total_nights > 10.50 | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- total_nights <= 4.50 | | | | | |--- total_nights <= 3.50 | | | | | | |--- weights: [1259.24, 0.00] class: 0 | | | | | |--- total_nights > 3.50 | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | |--- avg_price_per_room <= 90.05 | | | | | | | | |--- lead_time <= 48.00 | | | | | | | | | |--- arrival_month <= 2.50 | | | | | | | | | | |--- lead_time <= 20.00 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- lead_time > 20.00 | | | | | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 2.50 | | | | | | | | | | |--- weights: [45.48, 0.00] class: 0 | | | | | | | | |--- lead_time > 48.00 | | | | | | | | | |--- avg_price_per_room <= 89.85 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- weights: [13.42, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 89.85 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | |--- avg_price_per_room > 90.05 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- weights: [211.74, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- lead_time <= 54.50 | | | | | | | | | | | |--- weights: [12.67, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 54.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- lead_time <= 28.50 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- weights: [10.44, 0.00] class: 0 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- lead_time > 28.50 | | | | | | | | | | |--- lead_time <= 30.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | | |--- lead_time > 30.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | |--- lead_time <= 31.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | | |--- lead_time > 31.00 | | | | | | | | |--- avg_price_per_room <= 159.42 | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 159.42 | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | |--- total_nights > 4.50 | | | | | |--- total_nights <= 12.00 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- total_nights <= 6.50 | | | | | | | | |--- avg_price_per_room <= 144.28 | | | | | | | | | |--- avg_price_per_room <= 134.74 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- avg_price_per_room > 134.74 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- avg_price_per_room > 144.28 | | | | | | | | | |--- weights: [35.79, 0.00] class: 0 | | | | | | | |--- total_nights > 6.50 | | | | | | | | |--- lead_time <= 9.00 | | | | | | | | | |--- weights: [9.69, 0.00] class: 0 | | | | | | | | |--- lead_time > 9.00 | | | | | | | | | |--- lead_time <= 34.50 | | | | | | | | | | |--- arrival_date <= 9.50 | | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 9.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time > 34.50 | | | | | | | | | | |--- lead_time <= 72.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 72.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [51.44, 0.00] class: 0 | | | | | |--- total_nights > 12.00 | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | |--- lead_time > 90.50 | | | | |--- no_of_special_requests <= 2.50 | | | | | |--- arrival_month <= 8.50 | | | | | | |--- avg_price_per_room <= 202.95 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- arrival_date <= 26.00 | | | | | | | | | | | |--- weights: [0.00, 7.59] class: 1 | | | | | | | | | | |--- arrival_date > 26.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | |--- lead_time <= 98.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 98.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- lead_time <= 150.50 | | | | | | | | | |--- total_nights <= 5.50 | | | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- total_nights > 5.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time > 150.50 | | | | | | | | | |--- avg_price_per_room <= 131.97 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- avg_price_per_room > 131.97 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | |--- avg_price_per_room > 202.95 | | | | | | | |--- no_of_children <= 1.00 | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | |--- no_of_children > 1.00 | | | | | | | | |--- weights: [0.00, 7.59] class: 1 | | | | | |--- arrival_month > 8.50 | | | | | | |--- avg_price_per_room <= 153.15 | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | |--- avg_price_per_room <= 71.12 | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 71.12 | | | | | | | | | |--- avg_price_per_room <= 90.42 | | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 90.42 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 153.15 | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | |--- weights: [8.20, 0.00] class: 0 | | | | | | | |--- arrival_date > 22.50 | | | | | | | | |--- lead_time <= 106.50 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- lead_time > 106.50 | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | |--- no_of_special_requests > 2.50 | | | | | |--- weights: [67.10, 0.00] class: 0 |--- lead_time > 151.50 | |--- avg_price_per_room <= 100.04 | | |--- no_of_special_requests <= 0.50 | | | |--- no_of_adults <= 1.50 | | | | |--- market_segment_type_Online <= 0.50 | | | | | |--- lead_time <= 163.50 | | | | | | |--- arrival_month <= 5.00 | | | | | | | |--- weights: [2.98, 0.00] class: 0 | | | | | | |--- arrival_month > 5.00 | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | |--- weights: [0.75, 1.52] class: 1 | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | |--- weights: [0.00, 22.77] class: 1 | | | | | |--- lead_time > 163.50 | | | | | | |--- lead_time <= 341.00 | | | | | | | |--- lead_time <= 173.00 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | |--- total_nights <= 3.00 | | | | | | | | | | | |--- weights: [45.48, 9.11] class: 0 | | | | | | | | | | |--- total_nights > 3.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- total_nights <= 3.00 | | | | | | | | | | |--- weights: [0.00, 13.66] class: 1 | | | | | | | | | |--- total_nights > 3.00 | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | |--- lead_time > 173.00 | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | |--- weights: [6.71, 0.00] class: 0 | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- avg_price_per_room <= 55.21 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.21 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- lead_time <= 231.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 231.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- lead_time > 341.00 | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | |--- avg_price_per_room <= 88.33 | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | | | | |--- avg_price_per_room > 88.33 | | | | | | | | | |--- weights: [0.75, 1.52] class: 1 | | | | | | | |--- arrival_date > 8.50 | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | |--- avg_price_per_room <= 80.00 | | | | | | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 80.00 | | | | | | | | | | |--- lead_time <= 381.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 381.50 | | | | | | | | | | | |--- weights: [2.24, 3.04] class: 1 | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | |--- market_segment_type_Online > 0.50 | | | | | |--- avg_price_per_room <= 2.50 | | | | | | |--- lead_time <= 285.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | |--- lead_time > 285.50 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- avg_price_per_room > 2.50 | | | | | | |--- arrival_date <= 29.50 | | | | | | | |--- weights: [0.00, 88.05] class: 1 | | | | | | |--- arrival_date > 29.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- weights: [0.00, 9.11] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | |--- no_of_adults > 1.50 | | | | |--- avg_price_per_room <= 82.47 | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 197.36] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- total_nights <= 1.50 | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- total_nights > 1.50 | | | | | | | | |--- avg_price_per_room <= 80.51 | | | | | | | | | |--- total_nights <= 3.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 19.74] class: 1 | | | | | | | | | |--- total_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 57.69] class: 1 | | | | | | | | |--- avg_price_per_room > 80.51 | | | | | | | | | |--- avg_price_per_room <= 81.43 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 81.43 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- lead_time <= 244.00 | | | | | | | | |--- total_nights <= 2.50 | | | | | | | | | |--- lead_time <= 166.50 | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | |--- lead_time > 166.50 | | | | | | | | | | |--- arrival_date <= 19.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 19.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- total_nights > 2.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- arrival_date <= 15.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_date > 15.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | |--- lead_time > 244.00 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- weights: [25.35, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- avg_price_per_room <= 80.38 | | | | | | | | | | |--- avg_price_per_room <= 76.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- avg_price_per_room > 76.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- avg_price_per_room > 80.38 | | | | | | | | | | |--- weights: [7.46, 0.00] class: 0 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [46.22, 0.00] class: 0 | | | | |--- avg_price_per_room > 82.47 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- lead_time <= 324.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | | | |--- weights: [0.00, 499.46] class: 1 | | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- weights: [0.00, 19.74] class: 1 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- arrival_date <= 15.00 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 15.00 | | | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | |--- lead_time > 324.50 | | | | | | | |--- avg_price_per_room <= 89.00 | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 89.00 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- weights: [0.00, 6.07] class: 1 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- weights: [0.75, 7.59] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- weights: [5.22, 0.00] class: 0 | | |--- no_of_special_requests > 0.50 | | | |--- market_segment_type_Offline <= 0.50 | | | | |--- lead_time <= 180.50 | | | | | |--- lead_time <= 159.50 | | | | | | |--- arrival_month <= 8.50 | | | | | | | |--- lead_time <= 152.50 | | | | | | | | |--- arrival_date <= 23.00 | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- arrival_date > 23.00 | | | | | | | | | |--- avg_price_per_room <= 87.39 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 87.39 | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | |--- lead_time > 152.50 | | | | | | | | |--- lead_time <= 156.50 | | | | | | | | | |--- weights: [8.95, 0.00] class: 0 | | | | | | | | |--- lead_time > 156.50 | | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- arrival_month > 8.50 | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | |--- avg_price_per_room <= 87.12 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 87.12 | | | | | | | | | |--- avg_price_per_room <= 89.38 | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 89.38 | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | |--- arrival_date > 23.50 | | | | | | | | |--- no_of_adults <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- no_of_adults > 0.50 | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | | | |--- lead_time > 159.50 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- avg_price_per_room <= 93.44 | | | | | | | | |--- arrival_date <= 28.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- total_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_nights > 1.50 | | | | | | | | | | | |--- weights: [48.46, 0.00] class: 0 | | | | | | | | |--- arrival_date > 28.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 93.44 | | | | | | | | |--- lead_time <= 178.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- lead_time <= 170.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 170.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- weights: [13.42, 0.00] class: 0 | | | | | | | | |--- lead_time > 178.50 | | | | | | | | | |--- lead_time <= 179.50 | | | | | | | | | | |--- weights: [0.00, 4.55] class: 1 | | | | | | | | | |--- lead_time > 179.50 | | | | | | | | | | |--- avg_price_per_room <= 97.82 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 97.82 | | | | | | | | | | | |--- weights: [2.98, 1.52] class: 0 | | | | |--- lead_time > 180.50 | | | | | |--- total_nights <= 3.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- avg_price_per_room <= 66.47 | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 66.47 | | | | | | | | |--- lead_time <= 187.50 | | | | | | | | | |--- arrival_month <= 4.00 | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 4.00 | | | | | | | | | | |--- avg_price_per_room <= 78.30 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 78.30 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time > 187.50 | | | | | | | | | |--- lead_time <= 304.50 | | | | | | | | | | |--- avg_price_per_room <= 99.30 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- avg_price_per_room > 99.30 | | | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | | |--- lead_time > 304.50 | | | | | | | | | | |--- arrival_month <= 9.00 | | | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 9.00 | | | | | | | | | | | |--- weights: [0.00, 25.81] class: 1 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [8.20, 0.00] class: 0 | | | | | |--- total_nights > 3.50 | | | | | | |--- arrival_year <= 2017.50 | | | | | | | |--- weights: [14.17, 0.00] class: 0 | | | | | | |--- arrival_year > 2017.50 | | | | | | | |--- total_nights <= 11.50 | | | | | | | | |--- avg_price_per_room <= 69.40 | | | | | | | | | |--- avg_price_per_room <= 64.43 | | | | | | | | | | |--- avg_price_per_room <= 55.92 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 55.92 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 64.43 | | | | | | | | | | |--- weights: [8.20, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 69.40 | | | | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | | | | |--- weights: [5.22, 0.00] class: 0 | | | | | | | |--- total_nights > 11.50 | | | | | | | | |--- lead_time <= 198.00 | | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | | |--- lead_time > 198.00 | | | | | | | | | |--- weights: [0.00, 10.63] class: 1 | | | |--- market_segment_type_Offline > 0.50 | | | | |--- lead_time <= 348.50 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- total_nights <= 7.50 | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | |--- lead_time <= 331.00 | | | | | | | | | |--- weights: [108.85, 0.00] class: 0 | | | | | | | | |--- lead_time > 331.00 | | | | | | | | | |--- arrival_date <= 10.00 | | | | | | | | | | |--- weights: [5.96, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 10.00 | | | | | | | | | | |--- weights: [1.49, 1.52] class: 1 | | | | | | | |--- arrival_date > 30.50 | | | | | | | | |--- total_nights <= 5.00 | | | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | | | |--- total_nights > 5.00 | | | | | | | | | |--- weights: [1.49, 1.52] class: 1 | | | | | | |--- total_nights > 7.50 | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- lead_time <= 196.00 | | | | | | | |--- avg_price_per_room <= 94.95 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 94.95 | | | | | | | | |--- weights: [4.47, 0.00] class: 0 | | | | | | |--- lead_time > 196.00 | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | |--- weights: [0.00, 3.04] class: 1 | | | | | | | |--- arrival_date > 21.50 | | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | |--- lead_time > 348.50 | | | | | |--- total_nights <= 3.50 | | | | | | |--- arrival_date <= 18.50 | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | |--- arrival_date > 18.50 | | | | | | | |--- weights: [0.75, 1.52] class: 1 | | | | | |--- total_nights > 3.50 | | | | | | |--- avg_price_per_room <= 58.50 | | | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 58.50 | | | | | | | |--- weights: [4.47, 3.04] class: 0 | |--- avg_price_per_room > 100.04 | | |--- arrival_month <= 11.50 | | | |--- no_of_special_requests <= 2.50 | | | | |--- arrival_year <= 2017.50 | | | | | |--- weights: [0.00, 133.59] class: 1 | | | | |--- arrival_year > 2017.50 | | | | | |--- weights: [0.00, 3066.59] class: 1 | | | |--- no_of_special_requests > 2.50 | | | | |--- weights: [23.11, 0.00] class: 0 | | |--- arrival_month > 11.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- total_nights <= 1.50 | | | | | |--- weights: [0.75, 0.00] class: 0 | | | | |--- total_nights > 1.50 | | | | | |--- weights: [34.30, 0.00] class: 0 | | | |--- no_of_special_requests > 0.50 | | | | |--- arrival_date <= 24.50 | | | | | |--- weights: [3.73, 0.00] class: 0 | | | | |--- arrival_date > 24.50 | | | | | |--- lead_time <= 172.50 | | | | | | |--- avg_price_per_room <= 135.49 | | | | | | | |--- weights: [2.24, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 135.49 | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | |--- lead_time > 172.50 | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | |--- weights: [0.00, 13.66] class: 1 | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | |--- avg_price_per_room <= 139.01 | | | | | | | | | |--- weights: [1.49, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 139.01 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | |--- arrival_date <= 26.50 | | | | | | | | | |--- weights: [0.00, 1.52] class: 1 | | | | | | | | |--- arrival_date > 26.50 | | | | | | | | | |--- weights: [0.00, 4.55] class: 1
plt.figure(figsize=(20,30))
tree.plot_tree(decisiontree,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
<Figure size 2000x3000 with 0 Axes>
[Text(0.7131192802725084, 0.9861111111111112, 'node #0\nlead_time <= 151.5\ngini = 0.5\nsamples = 25392\nvalue = [12696.0, 12696.0]\nclass = y[0]'), Text(0.45287949672850264, 0.9583333333333334, 'node #1\nno_of_special_requests <= 0.5\ngini = 0.472\nsamples = 20410\nvalue = [11676.085, 7209.531]\nclass = y[0]'), Text(0.1735591734789969, 0.9305555555555556, 'node #2\nmarket_segment_type_Online <= 0.5\ngini = 0.5\nsamples = 10667\nvalue = [5306.837, 5387.792]\nclass = y[1]'), Text(0.10714157535355012, 0.9027777777777778, 'node #3\nlead_time <= 90.5\ngini = 0.381\nsamples = 5395\nvalue = [3439.976, 1185.648]\nclass = y[0]'), Text(0.08374171917497142, 0.875, 'node #4\ntotal_nights <= 5.5\ngini = 0.27\nsamples = 4149\nvalue = [2827.132, 541.967]\nclass = y[0]'), Text(0.0721436700303304, 0.8472222222222222, 'node #5\navg_price_per_room <= 201.5\ngini = 0.24\nsamples = 3970\nvalue = [2741.394, 444.808]\nclass = y[0]'), Text(0.05984772156590466, 0.8194444444444444, 'node #6\nlead_time <= 74.5\ngini = 0.23\nsamples = 3951\nvalue = [2739.903, 419.0]\nclass = y[0]'), Text(0.039912402052652736, 0.7916666666666666, 'node #7\narrival_month <= 5.5\ngini = 0.199\nsamples = 3622\nvalue = [2542.331, 321.84]\nclass = y[0]'), Text(0.023801662168440724, 0.7638888888888888, 'node #8\narrival_date <= 27.5\ngini = 0.307\nsamples = 1067\nvalue = [713.493, 166.993]\nclass = y[0]'), Text(0.018828599320306748, 0.7361111111111112, 'node #9\nlead_time <= 59.5\ngini = 0.244\nsamples = 957\nvalue = [659.813, 109.304]\nclass = y[0]'), Text(0.013434643474266145, 0.7083333333333334, 'node #10\nmarket_segment_type_Offline <= 0.5\ngini = 0.199\nsamples = 876\nvalue = [615.08, 77.424]\nclass = y[0]'), Text(0.00739205562835083, 0.6805555555555556, 'node #11\nlead_time <= 16.5\ngini = 0.329\nsamples = 351\nvalue = [231.867, 60.725]\nclass = y[0]'), Text(0.0050950891901627195, 0.6527777777777778, 'node #12\nrepeated_guest <= 0.5\ngini = 0.224\nsamples = 266\nvalue = [184.897, 27.326]\nclass = y[0]'), Text(0.004760984980971722, 0.625, 'node #13\nlead_time <= 6.5\ngini = 0.296\nsamples = 184\nvalue = [123.762, 27.326]\nclass = y[0]'), Text(0.003173989987314481, 0.5972222222222222, 'node #14\navg_price_per_room <= 79.5\ngini = 0.373\nsamples = 122\nvalue = [78.283, 25.808]\nclass = y[0]'), Text(0.00167052104595499, 0.5694444444444444, 'node #15\ntotal_nights <= 3.5\ngini = 0.183\nsamples = 57\nvalue = [40.26, 4.554]\nclass = y[0]'), Text(0.0013364168367639919, 0.5416666666666666, 'node #16\navg_price_per_room <= 65.5\ngini = 0.13\nsamples = 56\nvalue = [40.26, 3.036]\nclass = y[0]'), Text(0.0006682084183819959, 0.5138888888888888, 'node #17\navg_price_per_room <= 64.5\ngini = 0.241\nsamples = 27\nvalue = [18.639, 3.036]\nclass = y[0]'), Text(0.00033410420919099796, 0.4861111111111111, 'node #18\ngini = -0.0\nsamples = 17\nvalue = [12.674, 0.0]\nclass = y[0]'), Text(0.001002312627572994, 0.4861111111111111, 'node #19\narrival_date <= 11.0\ngini = 0.447\nsamples = 10\nvalue = [5.964, 3.036]\nclass = y[0]'), Text(0.0006682084183819959, 0.4583333333333333, 'node #20\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.0013364168367639919, 0.4583333333333333, 'node #21\narrival_date <= 17.0\ngini = 0.489\nsamples = 5\nvalue = [2.237, 3.036]\nclass = y[1]'), Text(0.0006682084183819959, 0.4305555555555556, 'node #22\narrival_month <= 3.0\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.00033410420919099796, 0.4027777777777778, 'node #23\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.001002312627572994, 0.4027777777777778, 'node #24\nlead_time <= 0.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.0006682084183819959, 0.375, 'node #25\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.0013364168367639919, 0.375, 'node #26\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.002004625255145988, 0.4305555555555556, 'node #27\narrival_month <= 3.0\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.00167052104595499, 0.4027777777777778, 'node #28\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.002338729464336986, 0.4027777777777778, 'node #29\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.002004625255145988, 0.5138888888888888, 'node #30\ntotal_nights <= 1.5\ngini = 0.0\nsamples = 29\nvalue = [21.621, 0.0]\nclass = y[0]'), Text(0.00167052104595499, 0.4861111111111111, 'node #31\ngini = 0.0\nsamples = 20\nvalue = [14.911, 0.0]\nclass = y[0]'), Text(0.002338729464336986, 0.4861111111111111, 'node #32\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.002004625255145988, 0.5416666666666666, 'node #33\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.004677458928673972, 0.5694444444444444, 'node #34\narrival_date <= 8.0\ngini = 0.46\nsamples = 65\nvalue = [38.023, 21.254]\nclass = y[0]'), Text(0.004343354719482974, 0.5416666666666666, 'node #35\ngini = 0.0\nsamples = 16\nvalue = [11.929, 0.0]\nclass = y[0]'), Text(0.00501156313786497, 0.5416666666666666, 'node #36\ntotal_nights <= 2.5\ngini = 0.495\nsamples = 49\nvalue = [26.094, 21.254]\nclass = y[0]'), Text(0.004677458928673972, 0.5138888888888888, 'node #37\narrival_date <= 24.5\ngini = 0.499\nsamples = 40\nvalue = [19.384, 21.254]\nclass = y[1]'), Text(0.004343354719482974, 0.4861111111111111, 'node #38\nno_of_adults <= 1.5\ngini = 0.5\nsamples = 38\nvalue = [19.384, 18.217]\nclass = y[0]'), Text(0.0036751463011009777, 0.4583333333333333, 'node #39\navg_price_per_room <= 86.5\ngini = 0.482\nsamples = 28\nvalue = [15.657, 10.627]\nclass = y[0]'), Text(0.00334104209190998, 0.4305555555555556, 'node #40\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.004009250510291976, 0.4305555555555556, 'node #41\narrival_date <= 22.5\ngini = 0.496\nsamples = 19\nvalue = [8.947, 10.627]\nclass = y[1]'), Text(0.0036751463011009777, 0.4027777777777778, 'node #42\nlead_time <= 2.5\ngini = 0.474\nsamples = 16\nvalue = [6.71, 10.627]\nclass = y[1]'), Text(0.002004625255145988, 0.375, 'node #43\nlead_time <= 1.5\ngini = 0.405\nsamples = 9\nvalue = [2.982, 7.591]\nclass = y[1]'), Text(0.0013364168367639919, 0.3472222222222222, 'node #44\nmarket_segment_type_Corporate <= 0.5\ngini = 0.489\nsamples = 5\nvalue = [2.237, 3.036]\nclass = y[1]'), Text(0.001002312627572994, 0.3194444444444444, 'node #45\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.00167052104595499, 0.3194444444444444, 'node #46\navg_price_per_room <= 151.59\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.0013364168367639919, 0.2916666666666667, 'node #47\navg_price_per_room <= 92.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.001002312627572994, 0.2638888888888889, 'node #48\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.00167052104595499, 0.2638888888888889, 'node #49\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.002004625255145988, 0.2916666666666667, 'node #50\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.0026728336735279837, 0.3472222222222222, 'node #51\navg_price_per_room <= 99.0\ngini = 0.242\nsamples = 4\nvalue = [0.746, 4.554]\nclass = y[1]'), Text(0.002338729464336986, 0.3194444444444444, 'node #52\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.003006937882718982, 0.3194444444444444, 'node #53\narrival_date <= 20.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.0026728336735279837, 0.2916666666666667, 'node #54\ntotal_nights <= 1.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.002338729464336986, 0.2638888888888889, 'node #55\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.003006937882718982, 0.2638888888888889, 'node #56\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.00334104209190998, 0.2916666666666667, 'node #57\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.005345667347055967, 0.375, 'node #58\navg_price_per_room <= 99.5\ngini = 0.495\nsamples = 7\nvalue = [3.728, 3.036]\nclass = y[0]'), Text(0.004677458928673972, 0.3472222222222222, 'node #59\ntotal_nights <= 1.5\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.004343354719482974, 0.3194444444444444, 'node #60\narrival_month <= 3.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.004009250510291976, 0.2916666666666667, 'node #61\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.004677458928673972, 0.2916666666666667, 'node #62\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.00501156313786497, 0.3194444444444444, 'node #63\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.006013875765437964, 0.3472222222222222, 'node #64\ntotal_nights <= 1.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.005679771556246965, 0.3194444444444444, 'node #65\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.006347979974628962, 0.3194444444444444, 'node #66\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.004343354719482974, 0.4027777777777778, 'node #67\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.00501156313786497, 0.4583333333333333, 'node #68\navg_price_per_room <= 87.5\ngini = 0.442\nsamples = 10\nvalue = [3.728, 7.591]\nclass = y[1]'), Text(0.004677458928673972, 0.4305555555555556, 'node #69\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.005345667347055967, 0.4305555555555556, 'node #70\narrival_date <= 22.0\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.00501156313786497, 0.4027777777777778, 'node #71\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.005679771556246965, 0.4027777777777778, 'node #72\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.00501156313786497, 0.4861111111111111, 'node #73\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.005345667347055967, 0.5138888888888888, 'node #74\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.006347979974628962, 0.5972222222222222, 'node #75\narrival_date <= 2.5\ngini = 0.063\nsamples = 62\nvalue = [45.479, 1.518]\nclass = y[0]'), Text(0.006013875765437964, 0.5694444444444444, 'node #76\navg_price_per_room <= 55.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.005679771556246965, 0.5416666666666666, 'node #77\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.006347979974628962, 0.5416666666666666, 'node #78\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.00668208418381996, 0.5694444444444444, 'node #79\ngini = 0.0\nsamples = 59\nvalue = [43.988, 0.0]\nclass = y[0]'), Text(0.005429193399353717, 0.625, 'node #80\ngini = 0.0\nsamples = 82\nvalue = [61.135, 0.0]\nclass = y[0]'), Text(0.009689022066538941, 0.6527777777777778, 'node #81\narrival_date <= 16.5\ngini = 0.486\nsamples = 85\nvalue = [46.97, 33.399]\nclass = y[0]'), Text(0.00835260522977495, 0.625, 'node #82\nlead_time <= 44.5\ngini = 0.301\nsamples = 40\nvalue = [26.84, 6.072]\nclass = y[0]'), Text(0.007684396811392953, 0.5972222222222222, 'node #83\narrival_date <= 4.5\ngini = 0.187\nsamples = 37\nvalue = [26.094, 3.036]\nclass = y[0]'), Text(0.007350292602201955, 0.5694444444444444, 'node #84\nno_of_adults <= 1.5\ngini = 0.495\nsamples = 7\nvalue = [3.728, 3.036]\nclass = y[0]'), Text(0.007016188393010958, 0.5416666666666666, 'node #85\nmarket_segment_type_Corporate <= 0.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.00668208418381996, 0.5138888888888888, 'node #86\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.007350292602201955, 0.5138888888888888, 'node #87\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.007684396811392953, 0.5416666666666666, 'node #88\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.008018501020583952, 0.5694444444444444, 'node #89\ngini = 0.0\nsamples = 30\nvalue = [22.367, 0.0]\nclass = y[0]'), Text(0.009020813648156946, 0.5972222222222222, 'node #90\narrival_month <= 4.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.008686709438965948, 0.5694444444444444, 'node #91\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.009354917857347943, 0.5694444444444444, 'node #92\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.011025438903302933, 0.625, 'node #93\narrival_month <= 1.5\ngini = 0.489\nsamples = 45\nvalue = [20.13, 27.326]\nclass = y[1]'), Text(0.010357230484920937, 0.5972222222222222, 'node #94\navg_price_per_room <= 63.0\ngini = 0.429\nsamples = 22\nvalue = [13.42, 6.072]\nclass = y[0]'), Text(0.01002312627572994, 0.5694444444444444, 'node #95\navg_price_per_room <= 60.5\ngini = 0.393\nsamples = 7\nvalue = [2.237, 6.072]\nclass = y[1]'), Text(0.009689022066538941, 0.5416666666666666, 'node #96\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.010357230484920937, 0.5416666666666666, 'node #97\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.010691334694111935, 0.5694444444444444, 'node #98\ngini = 0.0\nsamples = 15\nvalue = [11.183, 0.0]\nclass = y[0]'), Text(0.01169364732168493, 0.5972222222222222, 'node #99\narrival_date <= 18.5\ngini = 0.365\nsamples = 23\nvalue = [6.71, 21.254]\nclass = y[1]'), Text(0.01135954311249393, 0.5694444444444444, 'node #100\ngini = 0.0\nsamples = 5\nvalue = [0.0, 7.591]\nclass = y[1]'), Text(0.012027751530875928, 0.5694444444444444, 'node #101\narrival_date <= 23.5\ngini = 0.442\nsamples = 18\nvalue = [6.71, 13.663]\nclass = y[1]'), Text(0.011192491007898433, 0.5416666666666666, 'node #102\narrival_date <= 19.5\ngini = 0.5\nsamples = 12\nvalue = [5.964, 6.072]\nclass = y[1]'), Text(0.010524282589516437, 0.5138888888888888, 'node #103\navg_price_per_room <= 97.5\ngini = 0.442\nsamples = 8\nvalue = [2.982, 6.072]\nclass = y[1]'), Text(0.010190178380325439, 0.4861111111111111, 'node #104\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.010858386798707435, 0.4861111111111111, 'node #105\navg_price_per_room <= 104.5\ngini = 0.5\nsamples = 6\nvalue = [2.982, 3.036]\nclass = y[1]'), Text(0.010524282589516437, 0.4583333333333333, 'node #106\ngini = 0.489\nsamples = 5\nvalue = [2.237, 3.036]\nclass = y[1]'), Text(0.011192491007898433, 0.4583333333333333, 'node #107\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.011860699426280428, 0.5138888888888888, 'node #108\narrival_month <= 3.0\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.01152659521708943, 0.4861111111111111, 'node #109\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.012194803635471426, 0.4861111111111111, 'node #110\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.012863012053853422, 0.5416666666666666, 'node #111\nno_of_previous_cancellations <= 0.5\ngini = 0.163\nsamples = 6\nvalue = [0.746, 7.591]\nclass = y[1]'), Text(0.012528907844662424, 0.5138888888888888, 'node #112\ngini = -0.0\nsamples = 5\nvalue = [0.0, 7.591]\nclass = y[1]'), Text(0.01319711626304442, 0.5138888888888888, 'node #113\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.01947723132018146, 0.6805555555555556, 'node #114\ntype_of_meal_plan_Not Selected <= 0.5\ngini = 0.08\nsamples = 525\nvalue = [383.214, 16.699]\nclass = y[0]'), Text(0.01757179325213905, 0.6527777777777778, 'node #115\nno_of_adults <= 2.5\ngini = 0.062\nsamples = 501\nvalue = [367.557, 12.145]\nclass = y[0]'), Text(0.015765542371200217, 0.625, 'node #116\nlead_time <= 23.5\ngini = 0.055\nsamples = 497\nvalue = [365.32, 10.627]\nclass = y[0]'), Text(0.014199428890617415, 0.5972222222222222, 'node #117\nlead_time <= 0.5\ngini = 0.015\nsamples = 271\nvalue = [201.299, 1.518]\nclass = y[0]'), Text(0.013865324681426415, 0.5694444444444444, 'node #118\narrival_month <= 3.5\ngini = 0.183\nsamples = 19\nvalue = [13.42, 1.518]\nclass = y[0]'), Text(0.013531220472235417, 0.5416666666666666, 'node #119\ngini = 0.0\nsamples = 16\nvalue = [11.929, 0.0]\nclass = y[0]'), Text(0.014199428890617415, 0.5416666666666666, 'node #120\narrival_date <= 13.0\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.013865324681426415, 0.5138888888888888, 'node #121\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.014533533099808413, 0.5138888888888888, 'node #122\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.014533533099808413, 0.5694444444444444, 'node #123\ngini = 0.0\nsamples = 252\nvalue = [187.879, 0.0]\nclass = y[0]'), Text(0.01733165585178302, 0.5972222222222222, 'node #124\navg_price_per_room <= 74.9\ngini = 0.1\nsamples = 226\nvalue = [164.021, 9.109]\nclass = y[0]'), Text(0.015953475988870154, 0.5694444444444444, 'node #125\navg_price_per_room <= 62.6\ngini = 0.237\nsamples = 69\nvalue = [47.715, 7.591]\nclass = y[0]'), Text(0.015619371779679156, 0.5416666666666666, 'node #126\ngini = 0.0\nsamples = 32\nvalue = [23.858, 0.0]\nclass = y[0]'), Text(0.01628758019806115, 0.5416666666666666, 'node #127\narrival_date <= 20.5\ngini = 0.366\nsamples = 37\nvalue = [23.858, 7.591]\nclass = y[0]'), Text(0.015201741518190409, 0.5138888888888888, 'node #128\nlead_time <= 43.0\ngini = 0.234\nsamples = 28\nvalue = [19.384, 3.036]\nclass = y[0]'), Text(0.014366480995212913, 0.4861111111111111, 'node #129\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.0\nsamples = 20\nvalue = [14.911, 0.0]\nclass = y[0]'), Text(0.014032376786021915, 0.4583333333333333, 'node #130\ngini = 0.0\nsamples = 18\nvalue = [13.42, 0.0]\nclass = y[0]'), Text(0.01470058520440391, 0.4583333333333333, 'node #131\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.016037002041167904, 0.4861111111111111, 'node #132\narrival_date <= 16.5\ngini = 0.482\nsamples = 8\nvalue = [4.473, 3.036]\nclass = y[0]'), Text(0.015368793622785907, 0.4583333333333333, 'node #133\navg_price_per_room <= 64.2\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.015034689413594909, 0.4305555555555556, 'node #134\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.015702897831976904, 0.4305555555555556, 'node #135\narrival_month <= 2.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.015368793622785907, 0.4027777777777778, 'node #136\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.016037002041167904, 0.4027777777777778, 'node #137\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.0167052104595499, 0.4583333333333333, 'node #138\ntotal_nights <= 3.5\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.0163711062503589, 0.4305555555555556, 'node #139\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.017039314668740896, 0.4305555555555556, 'node #140\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.017373418877931895, 0.5138888888888888, 'node #141\nlead_time <= 37.5\ngini = 0.5\nsamples = 9\nvalue = [4.473, 4.554]\nclass = y[1]'), Text(0.017039314668740896, 0.4861111111111111, 'node #142\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.01770752308712289, 0.4861111111111111, 'node #143\navg_price_per_room <= 64.0\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.017373418877931895, 0.4583333333333333, 'node #144\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.01804162729631389, 0.4583333333333333, 'node #145\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.018709835714695887, 0.5694444444444444, 'node #146\nlead_time <= 33.5\ngini = 0.025\nsamples = 157\nvalue = [116.306, 1.518]\nclass = y[0]'), Text(0.018375731505504887, 0.5416666666666666, 'node #147\nlead_time <= 32.5\ngini = 0.139\nsamples = 26\nvalue = [18.639, 1.518]\nclass = y[0]'), Text(0.01804162729631389, 0.5138888888888888, 'node #148\ngini = 0.0\nsamples = 24\nvalue = [17.893, 0.0]\nclass = y[0]'), Text(0.018709835714695887, 0.5138888888888888, 'node #149\narrival_month <= 3.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.018375731505504887, 0.4861111111111111, 'node #150\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.019043939923886886, 0.4861111111111111, 'node #151\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.019043939923886886, 0.5416666666666666, 'node #152\ngini = 0.0\nsamples = 131\nvalue = [97.667, 0.0]\nclass = y[0]'), Text(0.019378044133077883, 0.625, 'node #153\narrival_date <= 7.5\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.019043939923886886, 0.5972222222222222, 'node #154\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.019712148342268882, 0.5972222222222222, 'node #155\narrival_date <= 14.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.019378044133077883, 0.5694444444444444, 'node #156\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.02004625255145988, 0.5694444444444444, 'node #157\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.02138266938822387, 0.6527777777777778, 'node #158\navg_price_per_room <= 61.6\ngini = 0.349\nsamples = 24\nvalue = [15.657, 4.554]\nclass = y[0]'), Text(0.020714460969841874, 0.625, 'node #159\navg_price_per_room <= 56.4\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.020380356760650878, 0.5972222222222222, 'node #160\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.021048565179032874, 0.5972222222222222, 'node #161\ntotal_nights <= 2.0\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.020714460969841874, 0.5694444444444444, 'node #162\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.02138266938822387, 0.5694444444444444, 'node #163\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.022050877806605865, 0.625, 'node #164\navg_price_per_room <= 92.44\ngini = 0.168\nsamples = 21\nvalue = [14.911, 1.518]\nclass = y[0]'), Text(0.02171677359741487, 0.5972222222222222, 'node #165\ngini = -0.0\nsamples = 18\nvalue = [13.42, 0.0]\nclass = y[0]'), Text(0.022384982015796865, 0.5972222222222222, 'node #166\navg_price_per_room <= 118.44\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.022050877806605865, 0.5694444444444444, 'node #167\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.02271908622498786, 0.5694444444444444, 'node #168\ngini = -0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.024222555166347352, 0.7083333333333334, 'node #169\narrival_date <= 16.5\ngini = 0.486\nsamples = 81\nvalue = [44.733, 31.88]\nclass = y[0]'), Text(0.023888450957156356, 0.6805555555555556, 'node #170\ngini = 0.0\nsamples = 26\nvalue = [19.384, 0.0]\nclass = y[0]'), Text(0.024556659375538352, 0.6805555555555556, 'node #171\nno_of_adults <= 1.5\ngini = 0.493\nsamples = 55\nvalue = [25.349, 31.88]\nclass = y[1]'), Text(0.023721398852560856, 0.6527777777777778, 'node #172\nlead_time <= 65.5\ngini = 0.378\nsamples = 28\nvalue = [17.893, 6.072]\nclass = y[0]'), Text(0.02338729464336986, 0.625, 'node #173\nlead_time <= 63.5\ngini = 0.144\nsamples = 25\nvalue = [17.893, 1.518]\nclass = y[0]'), Text(0.02305319043417886, 0.5972222222222222, 'node #174\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.023721398852560856, 0.5972222222222222, 'node #175\ntotal_nights <= 2.5\ngini = 0.183\nsamples = 19\nvalue = [13.42, 1.518]\nclass = y[0]'), Text(0.02338729464336986, 0.5694444444444444, 'node #176\ngini = 0.21\nsamples = 16\nvalue = [11.183, 1.518]\nclass = y[0]'), Text(0.024055503061751856, 0.5694444444444444, 'node #177\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.024055503061751856, 0.625, 'node #178\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.025391919898515847, 0.6527777777777778, 'node #179\nlead_time <= 64.5\ngini = 0.348\nsamples = 27\nvalue = [7.456, 25.808]\nclass = y[1]'), Text(0.02472371148013385, 0.625, 'node #180\navg_price_per_room <= 69.29\ngini = 0.103\nsamples = 19\nvalue = [1.491, 25.808]\nclass = y[1]'), Text(0.024389607270942852, 0.5972222222222222, 'node #181\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.025057815689324848, 0.5972222222222222, 'node #182\narrival_date <= 25.5\ngini = 0.055\nsamples = 18\nvalue = [0.746, 25.808]\nclass = y[1]'), Text(0.02472371148013385, 0.5694444444444444, 'node #183\ngini = 0.0\nsamples = 17\nvalue = [0.0, 25.808]\nclass = y[1]'), Text(0.025391919898515847, 0.5694444444444444, 'node #184\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.026060128316897843, 0.625, 'node #185\navg_price_per_room <= 49.085\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.025726024107706844, 0.5972222222222222, 'node #186\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.02639423252608884, 0.5972222222222222, 'node #187\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.0287747250165747, 0.7361111111111112, 'node #188\navg_price_per_room <= 61.0\ngini = 0.499\nsamples = 110\nvalue = [53.68, 57.688]\nclass = y[1]'), Text(0.02773064936285283, 0.7083333333333334, 'node #189\navg_price_per_room <= 59.75\ngini = 0.252\nsamples = 46\nvalue = [8.947, 51.616]\nclass = y[1]'), Text(0.027396545153661835, 0.6805555555555556, 'node #190\narrival_date <= 28.5\ngini = 0.248\nsamples = 13\nvalue = [8.947, 1.518]\nclass = y[0]'), Text(0.027062440944470835, 0.6527777777777778, 'node #191\navg_price_per_room <= 41.085\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.02672833673527984, 0.625, 'node #192\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.027396545153661835, 0.625, 'node #193\nlead_time <= 27.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.027062440944470835, 0.5972222222222222, 'node #194\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.02773064936285283, 0.5972222222222222, 'node #195\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.02773064936285283, 0.6527777777777778, 'node #196\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.02806475357204383, 0.6805555555555556, 'node #197\ngini = -0.0\nsamples = 33\nvalue = [0.0, 50.098]\nclass = y[1]'), Text(0.02981880067029657, 0.7083333333333334, 'node #198\narrival_date <= 29.5\ngini = 0.21\nsamples = 64\nvalue = [44.733, 6.072]\nclass = y[0]'), Text(0.028732961990425826, 0.6805555555555556, 'node #199\ntype_of_meal_plan_Meal Plan 2 <= 0.5\ngini = 0.0\nsamples = 36\nvalue = [26.84, 0.0]\nclass = y[0]'), Text(0.02839885778123483, 0.6527777777777778, 'node #200\ngini = 0.0\nsamples = 35\nvalue = [26.094, 0.0]\nclass = y[0]'), Text(0.029067066199616826, 0.6527777777777778, 'node #201\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.030904639350167313, 0.6805555555555556, 'node #202\navg_price_per_room <= 87.25\ngini = 0.378\nsamples = 28\nvalue = [17.893, 6.072]\nclass = y[0]'), Text(0.02973527461799882, 0.6527777777777778, 'node #203\navg_price_per_room <= 73.0\ngini = 0.183\nsamples = 19\nvalue = [13.42, 1.518]\nclass = y[0]'), Text(0.02940117040880782, 0.625, 'node #204\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.030069378827189817, 0.625, 'node #205\navg_price_per_room <= 79.5\ngini = 0.349\nsamples = 8\nvalue = [5.219, 1.518]\nclass = y[0]'), Text(0.02973527461799882, 0.5972222222222222, 'node #206\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.030403483036380817, 0.5972222222222222, 'node #207\ntotal_nights <= 3.0\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.030069378827189817, 0.5694444444444444, 'node #208\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.030737587245571813, 0.5694444444444444, 'node #209\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.03207400408233581, 0.6527777777777778, 'node #210\nno_of_adults <= 1.5\ngini = 0.5\nsamples = 9\nvalue = [4.473, 4.554]\nclass = y[1]'), Text(0.03140579566395381, 0.625, 'node #211\nlead_time <= 7.5\ngini = 0.442\nsamples = 6\nvalue = [2.237, 4.554]\nclass = y[1]'), Text(0.031071691454762813, 0.5972222222222222, 'node #212\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.03173989987314481, 0.5972222222222222, 'node #213\narrival_month <= 4.0\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.03140579566395381, 0.5694444444444444, 'node #214\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.03207400408233581, 0.5694444444444444, 'node #215\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.0327422125007178, 0.625, 'node #216\narrival_month <= 4.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.0324081082915268, 0.5972222222222222, 'node #217\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.0330763167099088, 0.5972222222222222, 'node #218\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.056023141936864744, 0.7638888888888888, 'node #219\nmarket_segment_type_Offline <= 0.5\ngini = 0.144\nsamples = 2555\nvalue = [1828.838, 154.848]\nclass = y[0]'), Text(0.04939130389388015, 0.7361111111111112, 'node #220\nrepeated_guest <= 0.5\ngini = 0.264\nsamples = 744\nvalue = [508.466, 94.123]\nclass = y[0]'), Text(0.049057199684689155, 0.7083333333333334, 'node #221\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.32\nsamples = 566\nvalue = [375.758, 94.123]\nclass = y[0]'), Text(0.041572482342070505, 0.6805555555555556, 'node #222\navg_price_per_room <= 61.0\ngini = 0.281\nsamples = 507\nvalue = [343.699, 69.833]\nclass = y[0]'), Text(0.041238378132879505, 0.6527777777777778, 'node #223\ngini = 0.0\nsamples = 75\nvalue = [55.916, 0.0]\nclass = y[0]'), Text(0.041906586551261504, 0.6527777777777778, 'node #224\narrival_month <= 11.5\ngini = 0.314\nsamples = 432\nvalue = [287.783, 69.833]\nclass = y[0]'), Text(0.041572482342070505, 0.625, 'node #225\ntotal_nights <= 3.5\ngini = 0.344\nsamples = 377\nvalue = [246.778, 69.833]\nclass = y[0]'), Text(0.0337445251282908, 0.5972222222222222, 'node #226\navg_price_per_room <= 121.5\ngini = 0.321\nsamples = 346\nvalue = [229.63, 57.688]\nclass = y[0]'), Text(0.0334104209190998, 0.5694444444444444, 'node #227\narrival_date <= 6.5\ngini = 0.348\nsamples = 306\nvalue = [199.808, 57.688]\nclass = y[0]'), Text(0.026096670964778107, 0.5416666666666666, 'node #228\navg_price_per_room <= 109.5\ngini = 0.158\nsamples = 45\nvalue = [32.059, 3.036]\nclass = y[0]'), Text(0.02542846254639611, 0.5138888888888888, 'node #229\narrival_month <= 10.5\ngini = 0.088\nsamples = 43\nvalue = [31.313, 1.518]\nclass = y[0]'), Text(0.025094358337205115, 0.4861111111111111, 'node #230\ngini = -0.0\nsamples = 37\nvalue = [27.585, 0.0]\nclass = y[0]'), Text(0.02576256675558711, 0.4861111111111111, 'node #231\narrival_date <= 4.0\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.02542846254639611, 0.4583333333333333, 'node #232\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.026096670964778107, 0.4583333333333333, 'node #233\nlead_time <= 11.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.02576256675558711, 0.4305555555555556, 'node #234\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.026430775173969107, 0.4305555555555556, 'node #235\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.026764879383160103, 0.5138888888888888, 'node #236\nlead_time <= 8.0\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.026430775173969107, 0.4861111111111111, 'node #237\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.027098983592351102, 0.4861111111111111, 'node #238\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.040724170873421485, 0.5416666666666666, 'node #239\nno_of_adults <= 1.5\ngini = 0.371\nsamples = 261\nvalue = [167.749, 54.652]\nclass = y[0]'), Text(0.035973626648986984, 0.5138888888888888, 'node #240\narrival_date <= 19.5\ngini = 0.331\nsamples = 217\nvalue = [143.146, 37.953]\nclass = y[0]'), Text(0.0310664710764942, 0.4861111111111111, 'node #241\narrival_date <= 14.5\ngini = 0.39\nsamples = 126\nvalue = [79.774, 28.844]\nclass = y[0]'), Text(0.028351874376817344, 0.4583333333333333, 'node #242\narrival_month <= 8.5\ngini = 0.287\nsamples = 75\nvalue = [50.698, 10.627]\nclass = y[0]'), Text(0.027098983592351102, 0.4305555555555556, 'node #243\nlead_time <= 9.5\ngini = 0.453\nsamples = 29\nvalue = [17.148, 9.109]\nclass = y[0]'), Text(0.026263723069373607, 0.4027777777777778, 'node #244\narrival_date <= 9.5\ngini = 0.499\nsamples = 13\nvalue = [6.71, 6.072]\nclass = y[0]'), Text(0.02559551465099161, 0.375, 'node #245\narrival_date <= 7.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.02526141044180061, 0.3472222222222222, 'node #246\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.025929618860182607, 0.3472222222222222, 'node #247\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.026931931487755603, 0.375, 'node #248\narrival_month <= 6.5\ngini = 0.447\nsamples = 10\nvalue = [5.964, 3.036]\nclass = y[0]'), Text(0.026597827278564606, 0.3472222222222222, 'node #249\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.027266035696946602, 0.3472222222222222, 'node #250\ntotal_nights <= 1.5\ngini = 0.323\nsamples = 9\nvalue = [5.964, 1.518]\nclass = y[0]'), Text(0.026931931487755603, 0.3194444444444444, 'node #251\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.027600139906137598, 0.3194444444444444, 'node #252\navg_price_per_room <= 94.075\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.027266035696946602, 0.2916666666666667, 'node #253\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.027934244115328598, 0.2916666666666667, 'node #254\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.027934244115328598, 0.4027777777777778, 'node #255\narrival_date <= 7.5\ngini = 0.349\nsamples = 16\nvalue = [10.438, 3.036]\nclass = y[0]'), Text(0.027600139906137598, 0.375, 'node #256\ngini = 0.465\nsamples = 9\nvalue = [5.219, 3.036]\nclass = y[0]'), Text(0.028268348324519594, 0.375, 'node #257\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.029604765161283585, 0.4305555555555556, 'node #258\nlead_time <= 3.5\ngini = 0.083\nsamples = 46\nvalue = [33.55, 1.518]\nclass = y[0]'), Text(0.02927066095209259, 0.4027777777777778, 'node #259\nlead_time <= 2.5\ngini = 0.191\nsamples = 18\nvalue = [12.674, 1.518]\nclass = y[0]'), Text(0.02893655674290159, 0.375, 'node #260\ngini = 0.0\nsamples = 17\nvalue = [12.674, 0.0]\nclass = y[0]'), Text(0.029604765161283585, 0.375, 'node #261\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.029938869370474585, 0.4027777777777778, 'node #262\ngini = -0.0\nsamples = 28\nvalue = [20.875, 0.0]\nclass = y[0]'), Text(0.03378106777617106, 0.4583333333333333, 'node #263\nlead_time <= 22.5\ngini = 0.474\nsamples = 51\nvalue = [29.077, 18.217]\nclass = y[0]'), Text(0.03160939041642957, 0.4305555555555556, 'node #264\narrival_month <= 10.5\ngini = 0.426\nsamples = 39\nvalue = [23.858, 10.627]\nclass = y[0]'), Text(0.03060707778885658, 0.4027777777777778, 'node #265\narrival_month <= 8.5\ngini = 0.281\nsamples = 22\nvalue = [14.911, 3.036]\nclass = y[0]'), Text(0.03027297357966558, 0.375, 'node #266\narrival_year <= 2017.5\ngini = 0.482\nsamples = 8\nvalue = [4.473, 3.036]\nclass = y[0]'), Text(0.029604765161283585, 0.3472222222222222, 'node #267\navg_price_per_room <= 66.0\ngini = 0.489\nsamples = 5\nvalue = [2.237, 3.036]\nclass = y[1]'), Text(0.02927066095209259, 0.3194444444444444, 'node #268\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.029938869370474585, 0.3194444444444444, 'node #269\ntotal_nights <= 1.5\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.029604765161283585, 0.2916666666666667, 'node #270\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.03027297357966558, 0.2916666666666667, 'node #271\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.03094118199804758, 0.3472222222222222, 'node #272\nlead_time <= 3.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.03060707778885658, 0.3194444444444444, 'node #273\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.03127528620723858, 0.3194444444444444, 'node #274\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.03094118199804758, 0.375, 'node #275\ngini = -0.0\nsamples = 14\nvalue = [10.438, 0.0]\nclass = y[0]'), Text(0.03261170304400257, 0.4027777777777778, 'node #276\nlead_time <= 19.5\ngini = 0.497\nsamples = 17\nvalue = [8.947, 7.591]\nclass = y[0]'), Text(0.03194349462562057, 0.375, 'node #277\narrival_date <= 16.5\ngini = 0.442\nsamples = 8\nvalue = [2.982, 6.072]\nclass = y[1]'), Text(0.03160939041642957, 0.3472222222222222, 'node #278\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.03227759883481157, 0.3472222222222222, 'node #279\navg_price_per_room <= 66.0\ngini = 0.317\nsamples = 6\nvalue = [1.491, 6.072]\nclass = y[1]'), Text(0.03194349462562057, 0.3194444444444444, 'node #280\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.03261170304400257, 0.3194444444444444, 'node #281\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.033279911462384563, 0.375, 'node #282\navg_price_per_room <= 66.5\ngini = 0.323\nsamples = 9\nvalue = [5.964, 1.518]\nclass = y[0]'), Text(0.032945807253193564, 0.3472222222222222, 'node #283\ngini = 0.349\nsamples = 8\nvalue = [5.219, 1.518]\nclass = y[0]'), Text(0.03361401567157556, 0.3472222222222222, 'node #284\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.035952745135912546, 0.4305555555555556, 'node #285\nlead_time <= 38.0\ngini = 0.483\nsamples = 12\nvalue = [5.219, 7.591]\nclass = y[1]'), Text(0.035284536717530554, 0.4027777777777778, 'node #286\ntotal_nights <= 2.5\ngini = 0.405\nsamples = 9\nvalue = [2.982, 7.591]\nclass = y[1]'), Text(0.034950432508339555, 0.375, 'node #287\narrival_date <= 15.5\ngini = 0.5\nsamples = 6\nvalue = [2.982, 3.036]\nclass = y[1]'), Text(0.034282224089957555, 0.3472222222222222, 'node #288\narrival_month <= 9.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.03394811988076656, 0.3194444444444444, 'node #289\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.034616328299148555, 0.3194444444444444, 'node #290\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.035618640926721554, 0.3472222222222222, 'node #291\narrival_month <= 8.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.035284536717530554, 0.3194444444444444, 'node #292\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.035952745135912546, 0.3194444444444444, 'node #293\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.035618640926721554, 0.375, 'node #294\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.036620953554294546, 0.4027777777777778, 'node #295\nlead_time <= 48.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.036286849345103546, 0.375, 'node #296\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.036955057763485545, 0.375, 'node #297\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04088078222147977, 0.4861111111111111, 'node #298\nlead_time <= 43.5\ngini = 0.22\nsamples = 91\nvalue = [63.372, 9.109]\nclass = y[0]'), Text(0.04004552169850227, 0.4583333333333333, 'node #299\navg_price_per_room <= 107.5\ngini = 0.193\nsamples = 89\nvalue = [62.626, 7.591]\nclass = y[0]'), Text(0.03937731328012028, 0.4305555555555556, 'node #300\navg_price_per_room <= 81.5\ngini = 0.163\nsamples = 87\nvalue = [61.881, 6.072]\nclass = y[0]'), Text(0.03904320907092928, 0.4027777777777778, 'node #301\ntotal_nights <= 1.5\ngini = 0.216\nsamples = 62\nvalue = [43.242, 6.072]\nclass = y[0]'), Text(0.03762326618186754, 0.375, 'node #302\nlead_time <= 18.0\ngini = 0.097\nsamples = 39\nvalue = [28.331, 1.518]\nclass = y[0]'), Text(0.03728916197267654, 0.3472222222222222, 'node #303\ngini = 0.0\nsamples = 35\nvalue = [26.094, 0.0]\nclass = y[0]'), Text(0.03795737039105854, 0.3472222222222222, 'node #304\nlead_time <= 23.5\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.03762326618186754, 0.3194444444444444, 'node #305\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.03829147460024954, 0.3194444444444444, 'node #306\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.04046315195999102, 0.375, 'node #307\nlead_time <= 5.5\ngini = 0.358\nsamples = 23\nvalue = [14.911, 4.554]\nclass = y[0]'), Text(0.03962789143701353, 0.3472222222222222, 'node #308\ntotal_nights <= 2.5\ngini = 0.498\nsamples = 10\nvalue = [5.219, 4.554]\nclass = y[0]'), Text(0.03895968301863153, 0.3194444444444444, 'node #309\narrival_date <= 23.0\ngini = 0.442\nsamples = 6\nvalue = [2.237, 4.554]\nclass = y[1]'), Text(0.03862557880944053, 0.2916666666666667, 'node #310\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.03929378722782253, 0.2916666666666667, 'node #311\narrival_month <= 7.5\ngini = 0.372\nsamples = 5\nvalue = [1.491, 4.554]\nclass = y[1]'), Text(0.03895968301863153, 0.2638888888888889, 'node #312\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.03962789143701353, 0.2638888888888889, 'node #313\nlead_time <= 2.5\ngini = 0.242\nsamples = 4\nvalue = [0.746, 4.554]\nclass = y[1]'), Text(0.03929378722782253, 0.2361111111111111, 'node #314\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.03996199564620452, 0.2361111111111111, 'node #315\narrival_month <= 8.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.03962789143701353, 0.20833333333333334, 'node #316\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.04029609985539552, 0.20833333333333334, 'node #317\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04029609985539552, 0.3194444444444444, 'node #318\navg_price_per_room <= 66.0\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.03996199564620452, 0.2916666666666667, 'node #319\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.04063020406458652, 0.2916666666666667, 'node #320\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04129841248296852, 0.3472222222222222, 'node #321\narrival_date <= 22.5\ngini = 0.0\nsamples = 13\nvalue = [9.692, 0.0]\nclass = y[0]'), Text(0.04096430827377752, 0.3194444444444444, 'node #322\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04163251669215951, 0.3194444444444444, 'node #323\ngini = 0.0\nsamples = 12\nvalue = [8.947, 0.0]\nclass = y[0]'), Text(0.03971141748931128, 0.4027777777777778, 'node #324\ngini = 0.0\nsamples = 25\nvalue = [18.639, 0.0]\nclass = y[0]'), Text(0.04071373011688427, 0.4305555555555556, 'node #325\navg_price_per_room <= 115.0\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.04037962590769327, 0.4027777777777778, 'node #326\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.04104783432607527, 0.4027777777777778, 'node #327\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04171604274445726, 0.4583333333333333, 'node #328\narrival_month <= 8.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.04138193853526626, 0.4305555555555556, 'node #329\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.04205014695364826, 0.4305555555555556, 'node #330\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04547471509785599, 0.5138888888888888, 'node #331\narrival_date <= 18.5\ngini = 0.482\nsamples = 44\nvalue = [24.603, 16.699]\nclass = y[0]'), Text(0.04422182431338975, 0.4861111111111111, 'node #332\narrival_date <= 16.5\ngini = 0.474\nsamples = 16\nvalue = [6.71, 10.627]\nclass = y[1]'), Text(0.04338656379041225, 0.4583333333333333, 'node #333\narrival_date <= 10.0\ngini = 0.447\nsamples = 10\nvalue = [5.964, 3.036]\nclass = y[0]'), Text(0.04271835537203026, 0.4305555555555556, 'node #334\narrival_year <= 2017.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.04238425116283926, 0.4027777777777778, 'node #335\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.04305245958122125, 0.4027777777777778, 'node #336\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04405477220879425, 0.4305555555555556, 'node #337\navg_price_per_room <= 70.0\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.04372066799960325, 0.4027777777777778, 'node #338\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.044388876417985244, 0.4027777777777778, 'node #339\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.045057084836367244, 0.4583333333333333, 'node #340\narrival_month <= 9.5\ngini = 0.163\nsamples = 6\nvalue = [0.746, 7.591]\nclass = y[1]'), Text(0.044722980627176244, 0.4305555555555556, 'node #341\ngini = -0.0\nsamples = 5\nvalue = [0.0, 7.591]\nclass = y[1]'), Text(0.04539118904555824, 0.4305555555555556, 'node #342\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.046727605882322235, 0.4861111111111111, 'node #343\narrival_date <= 24.0\ngini = 0.378\nsamples = 28\nvalue = [17.893, 6.072]\nclass = y[0]'), Text(0.046393501673131235, 0.4583333333333333, 'node #344\narrival_date <= 22.5\ngini = 0.447\nsamples = 20\nvalue = [11.929, 6.072]\nclass = y[0]'), Text(0.046059397463940235, 0.4305555555555556, 'node #345\narrival_month <= 7.5\ngini = 0.4\nsamples = 19\nvalue = [11.929, 4.554]\nclass = y[0]'), Text(0.045725293254749236, 0.4027777777777778, 'node #346\narrival_date <= 21.5\ngini = 0.498\nsamples = 10\nvalue = [5.219, 4.554]\nclass = y[0]'), Text(0.045057084836367244, 0.375, 'node #347\narrival_month <= 6.5\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.044722980627176244, 0.3472222222222222, 'node #348\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04539118904555824, 0.3472222222222222, 'node #349\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.046393501673131235, 0.375, 'node #350\ntotal_nights <= 1.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.046059397463940235, 0.3472222222222222, 'node #351\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.046727605882322235, 0.3472222222222222, 'node #352\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.046393501673131235, 0.4027777777777778, 'node #353\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.046727605882322235, 0.4305555555555556, 'node #354\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.047061710091513234, 0.4583333333333333, 'node #355\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.03407862933748179, 0.5694444444444444, 'node #356\ngini = 0.0\nsamples = 40\nvalue = [29.822, 0.0]\nclass = y[0]'), Text(0.04940043955585022, 0.5972222222222222, 'node #357\nlead_time <= 10.5\ngini = 0.485\nsamples = 31\nvalue = [17.148, 12.145]\nclass = y[0]'), Text(0.048064022719086226, 0.5694444444444444, 'node #358\nroom_type_reserved_Room_Type 5 <= 0.5\ngini = 0.385\nsamples = 12\nvalue = [3.728, 10.627]\nclass = y[1]'), Text(0.04739581430070423, 0.5416666666666666, 'node #359\nlead_time <= 3.5\ngini = 0.216\nsamples = 9\nvalue = [1.491, 10.627]\nclass = y[1]'), Text(0.047061710091513234, 0.5138888888888888, 'node #360\ngini = 0.0\nsamples = 5\nvalue = [0.0, 7.591]\nclass = y[1]'), Text(0.047729918509895226, 0.5138888888888888, 'node #361\narrival_month <= 9.5\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.04739581430070423, 0.4861111111111111, 'node #362\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.048064022719086226, 0.4861111111111111, 'node #363\narrival_date <= 10.0\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.047729918509895226, 0.4583333333333333, 'node #364\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.048398126928277226, 0.4583333333333333, 'node #365\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04873223113746822, 0.5416666666666666, 'node #366\nlead_time <= 3.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.048398126928277226, 0.5138888888888888, 'node #367\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.04906633534665922, 0.5138888888888888, 'node #368\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.05073685639261421, 0.5694444444444444, 'node #369\narrival_month <= 9.5\ngini = 0.183\nsamples = 19\nvalue = [13.42, 1.518]\nclass = y[0]'), Text(0.05006864797423221, 0.5416666666666666, 'node #370\nlead_time <= 12.5\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.04973454376504122, 0.5138888888888888, 'node #371\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.05040275218342321, 0.5138888888888888, 'node #372\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.05140506481099621, 0.5416666666666666, 'node #373\narrival_month <= 10.5\ngini = 0.0\nsamples = 14\nvalue = [10.438, 0.0]\nclass = y[0]'), Text(0.05107096060180521, 0.5138888888888888, 'node #374\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.0517391690201872, 0.5138888888888888, 'node #375\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.042240690760452504, 0.625, 'node #376\ngini = 0.0\nsamples = 55\nvalue = [41.005, 0.0]\nclass = y[0]'), Text(0.0565419170273078, 0.6805555555555556, 'node #377\nlead_time <= 9.5\ngini = 0.49\nsamples = 59\nvalue = [32.059, 24.29]\nclass = y[0]'), Text(0.05499668505979943, 0.6527777777777778, 'node #378\narrival_date <= 13.0\ngini = 0.411\nsamples = 42\nvalue = [26.094, 10.627]\nclass = y[0]'), Text(0.05466258085060843, 0.625, 'node #379\nlead_time <= 4.5\ngini = 0.5\nsamples = 21\nvalue = [10.438, 10.627]\nclass = y[1]'), Text(0.05324263796154669, 0.5972222222222222, 'node #380\narrival_month <= 9.5\ngini = 0.447\nsamples = 15\nvalue = [8.947, 4.554]\nclass = y[0]'), Text(0.05290853375235569, 0.5694444444444444, 'node #381\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.05357674217073769, 0.5694444444444444, 'node #382\narrival_date <= 8.5\ngini = 0.498\nsamples = 10\nvalue = [5.219, 4.554]\nclass = y[0]'), Text(0.0527414816477602, 0.5416666666666666, 'node #383\narrival_month <= 10.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.0524073774385692, 0.5138888888888888, 'node #384\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.05307558585695119, 0.5138888888888888, 'node #385\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.054412002693715183, 0.5416666666666666, 'node #386\ntotal_nights <= 1.5\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.05374379427533319, 0.5138888888888888, 'node #387\narrival_month <= 10.5\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.05340969006614219, 0.4861111111111111, 'node #388\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.05407789848452419, 0.4861111111111111, 'node #389\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.05508021111209718, 0.5138888888888888, 'node #390\narrival_date <= 11.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.05474610690290618, 0.4861111111111111, 'node #391\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.05541431532128818, 0.4861111111111111, 'node #392\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.056082523739670175, 0.5972222222222222, 'node #393\nlead_time <= 5.5\ngini = 0.317\nsamples = 6\nvalue = [1.491, 6.072]\nclass = y[1]'), Text(0.05574841953047918, 0.5694444444444444, 'node #394\narrival_date <= 9.0\ngini = 0.195\nsamples = 5\nvalue = [0.746, 6.072]\nclass = y[1]'), Text(0.05541431532128818, 0.5416666666666666, 'node #395\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.056082523739670175, 0.5416666666666666, 'node #396\narrival_month <= 9.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.05574841953047918, 0.5138888888888888, 'node #397\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.056416627948861174, 0.5138888888888888, 'node #398\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.056416627948861174, 0.5694444444444444, 'node #399\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.05533078926899043, 0.625, 'node #400\ngini = -0.0\nsamples = 21\nvalue = [15.657, 0.0]\nclass = y[0]'), Text(0.058087148994816165, 0.6527777777777778, 'node #401\nno_of_adults <= 1.5\ngini = 0.423\nsamples = 17\nvalue = [5.964, 13.663]\nclass = y[1]'), Text(0.057753044785625166, 0.625, 'node #402\narrival_date <= 16.5\ngini = 0.5\nsamples = 12\nvalue = [5.964, 6.072]\nclass = y[1]'), Text(0.057418940576434166, 0.5972222222222222, 'node #403\nlead_time <= 12.0\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.05708483636724317, 0.5694444444444444, 'node #404\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.057753044785625166, 0.5694444444444444, 'node #405\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.058087148994816165, 0.5972222222222222, 'node #406\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.058421253204007165, 0.625, 'node #407\ngini = -0.0\nsamples = 5\nvalue = [0.0, 7.591]\nclass = y[1]'), Text(0.04972540810307115, 0.7083333333333334, 'node #408\ngini = -0.0\nsamples = 178\nvalue = [132.708, 0.0]\nclass = y[0]'), Text(0.06265497997984934, 0.7361111111111112, 'node #409\navg_price_per_room <= 50.0\ngini = 0.084\nsamples = 1811\nvalue = [1320.372, 60.725]\nclass = y[0]'), Text(0.05942356583158016, 0.7083333333333334, 'node #410\narrival_month <= 9.5\ngini = 0.447\nsamples = 30\nvalue = [17.893, 9.109]\nclass = y[0]'), Text(0.05908946162238916, 0.6805555555555556, 'node #411\ntotal_nights <= 2.5\ngini = 0.412\nsamples = 11\nvalue = [3.728, 9.109]\nclass = y[1]'), Text(0.05875535741319816, 0.6527777777777778, 'node #412\ngini = 0.0\nsamples = 6\nvalue = [0.0, 9.109]\nclass = y[1]'), Text(0.05942356583158016, 0.6527777777777778, 'node #413\narrival_date <= 8.0\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.05908946162238916, 0.625, 'node #414\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.059757670040771156, 0.625, 'node #415\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.059757670040771156, 0.6805555555555556, 'node #416\ngini = -0.0\nsamples = 19\nvalue = [14.165, 0.0]\nclass = y[0]'), Text(0.06588639412811852, 0.7083333333333334, 'node #417\narrival_date <= 1.5\ngini = 0.073\nsamples = 1781\nvalue = [1302.479, 51.616]\nclass = y[0]'), Text(0.06142819108672615, 0.6805555555555556, 'node #418\navg_price_per_room <= 89.75\ngini = 0.386\nsamples = 27\nvalue = [17.148, 6.072]\nclass = y[0]'), Text(0.06075998266834415, 0.6527777777777778, 'node #419\narrival_month <= 8.5\ngini = 0.0\nsamples = 21\nvalue = [15.657, 0.0]\nclass = y[0]'), Text(0.06042587845915315, 0.625, 'node #420\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.06109408687753515, 0.625, 'node #421\ngini = 0.0\nsamples = 18\nvalue = [13.42, 0.0]\nclass = y[0]'), Text(0.06209639950510814, 0.6527777777777778, 'node #422\nno_of_adults <= 1.5\ngini = 0.317\nsamples = 6\nvalue = [1.491, 6.072]\nclass = y[1]'), Text(0.06176229529591714, 0.625, 'node #423\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.06243050371429914, 0.625, 'node #424\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.0703445971695109, 0.6805555555555556, 'node #425\navg_price_per_room <= 85.55\ngini = 0.066\nsamples = 1754\nvalue = [1285.331, 45.543]\nclass = y[0]'), Text(0.06606388698925124, 0.6527777777777778, 'node #426\narrival_month <= 9.5\ngini = 0.102\nsamples = 773\nvalue = [560.655, 31.88]\nclass = y[0]'), Text(0.06309871213268113, 0.625, 'node #427\nno_of_children <= 0.5\ngini = 0.219\nsamples = 229\nvalue = [159.548, 22.772]\nclass = y[0]'), Text(0.06276460792349013, 0.5972222222222222, 'node #428\nlead_time <= 42.5\ngini = 0.207\nsamples = 228\nvalue = [159.548, 21.254]\nclass = y[0]'), Text(0.0610105608252374, 0.5694444444444444, 'node #429\nlead_time <= 41.5\ngini = 0.299\nsamples = 121\nvalue = [81.265, 18.217]\nclass = y[0]'), Text(0.0606764566160464, 0.5416666666666666, 'node #430\narrival_date <= 17.5\ngini = 0.265\nsamples = 119\nvalue = [81.265, 15.181]\nclass = y[0]'), Text(0.059507091883877906, 0.5138888888888888, 'node #431\narrival_date <= 9.5\ngini = 0.374\nsamples = 57\nvalue = [36.532, 12.145]\nclass = y[0]'), Text(0.05917298767468691, 0.4861111111111111, 'node #432\ngini = 0.0\nsamples = 14\nvalue = [10.438, 0.0]\nclass = y[0]'), Text(0.059841196093068906, 0.4861111111111111, 'node #433\nlead_time <= 3.5\ngini = 0.433\nsamples = 43\nvalue = [26.094, 12.145]\nclass = y[0]'), Text(0.059507091883877906, 0.4583333333333333, 'node #434\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.0601753003022599, 0.4583333333333333, 'node #435\nlead_time <= 4.5\ngini = 0.482\nsamples = 32\nvalue = [17.893, 12.145]\nclass = y[0]'), Text(0.059841196093068906, 0.4305555555555556, 'node #436\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.0605094045114509, 0.4305555555555556, 'node #437\navg_price_per_room <= 66.525\ngini = 0.418\nsamples = 29\nvalue = [17.893, 7.591]\nclass = y[0]'), Text(0.0601753003022599, 0.4027777777777778, 'node #438\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.0608435087206419, 0.4027777777777778, 'node #439\navg_price_per_room <= 84.5\ngini = 0.378\nsamples = 28\nvalue = [17.893, 6.072]\nclass = y[0]'), Text(0.05934003977928241, 0.375, 'node #440\nlead_time <= 30.0\ngini = 0.2\nsamples = 17\nvalue = [11.929, 1.518]\nclass = y[0]'), Text(0.05900593557009141, 0.3472222222222222, 'node #441\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.059674143988473406, 0.3472222222222222, 'node #442\narrival_month <= 8.5\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.05934003977928241, 0.3194444444444444, 'node #443\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.060008248197664406, 0.3194444444444444, 'node #444\nno_of_adults <= 1.5\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.059674143988473406, 0.2916666666666667, 'node #445\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.0603423524068554, 0.2916666666666667, 'node #446\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.06234697766200139, 0.375, 'node #447\nlead_time <= 32.0\ngini = 0.491\nsamples = 11\nvalue = [5.964, 4.554]\nclass = y[0]'), Text(0.06167876924361939, 0.3472222222222222, 'node #448\nlead_time <= 24.5\ngini = 0.442\nsamples = 6\nvalue = [2.237, 4.554]\nclass = y[1]'), Text(0.0613446650344284, 0.3194444444444444, 'node #449\ntotal_nights <= 1.5\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.0610105608252374, 0.2916666666666667, 'node #450\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06167876924361939, 0.2916666666666667, 'node #451\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.06201287345281039, 0.3194444444444444, 'node #452\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.06301518608038338, 0.3472222222222222, 'node #453\narrival_date <= 10.5\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.06268108187119238, 0.3194444444444444, 'node #454\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.06334929028957438, 0.3194444444444444, 'node #455\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.06184582134821489, 0.5138888888888888, 'node #456\nlead_time <= 34.5\ngini = 0.119\nsamples = 62\nvalue = [44.733, 3.036]\nclass = y[0]'), Text(0.0611776129298329, 0.4861111111111111, 'node #457\navg_price_per_room <= 64.6\ngini = 0.0\nsamples = 49\nvalue = [36.532, 0.0]\nclass = y[0]'), Text(0.0608435087206419, 0.4583333333333333, 'node #458\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.0615117171390239, 0.4583333333333333, 'node #459\ngini = 0.0\nsamples = 48\nvalue = [35.786, 0.0]\nclass = y[0]'), Text(0.06251402976659688, 0.4861111111111111, 'node #460\nno_of_adults <= 1.5\ngini = 0.394\nsamples = 13\nvalue = [8.201, 3.036]\nclass = y[0]'), Text(0.06217992555740589, 0.4583333333333333, 'node #461\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.06284813397578788, 0.4583333333333333, 'node #462\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.0613446650344284, 0.5416666666666666, 'node #463\ngini = -0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.06451865502174288, 0.5694444444444444, 'node #464\ntotal_nights <= 4.5\ngini = 0.072\nsamples = 107\nvalue = [78.283, 3.036]\nclass = y[0]'), Text(0.06385044660336088, 0.5416666666666666, 'node #465\narrival_date <= 26.5\ngini = 0.038\nsamples = 105\nvalue = [77.537, 1.518]\nclass = y[0]'), Text(0.06351634239416988, 0.5138888888888888, 'node #466\ngini = 0.0\nsamples = 87\nvalue = [64.863, 0.0]\nclass = y[0]'), Text(0.06418455081255188, 0.5138888888888888, 'node #467\narrival_date <= 27.5\ngini = 0.191\nsamples = 18\nvalue = [12.674, 1.518]\nclass = y[0]'), Text(0.06385044660336088, 0.4861111111111111, 'node #468\ntotal_nights <= 3.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.06351634239416988, 0.4583333333333333, 'node #469\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06418455081255188, 0.4583333333333333, 'node #470\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06451865502174288, 0.4861111111111111, 'node #471\ngini = 0.0\nsamples = 16\nvalue = [11.929, 0.0]\nclass = y[0]'), Text(0.06518686344012486, 0.5416666666666666, 'node #472\nlead_time <= 73.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.06485275923093388, 0.5138888888888888, 'node #473\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06552096764931586, 0.5138888888888888, 'node #474\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06343281634187213, 0.5972222222222222, 'node #475\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06902906184582135, 0.625, 'node #476\nlead_time <= 60.5\ngini = 0.043\nsamples = 544\nvalue = [401.107, 9.109]\nclass = y[0]'), Text(0.06702443659067536, 0.5972222222222222, 'node #477\narrival_month <= 10.5\ngini = 0.025\nsamples = 480\nvalue = [355.628, 4.554]\nclass = y[0]'), Text(0.06669033238148436, 0.5694444444444444, 'node #478\nlead_time <= 24.0\ngini = 0.06\nsamples = 194\nvalue = [142.4, 4.554]\nclass = y[0]'), Text(0.06635622817229336, 0.5416666666666666, 'node #479\ngini = 0.0\nsamples = 75\nvalue = [55.916, 0.0]\nclass = y[0]'), Text(0.06702443659067536, 0.5416666666666666, 'node #480\narrival_date <= 10.0\ngini = 0.095\nsamples = 119\nvalue = [86.484, 4.554]\nclass = y[0]'), Text(0.06618917606769786, 0.5138888888888888, 'node #481\nlead_time <= 29.5\ngini = 0.323\nsamples = 18\nvalue = [11.929, 3.036]\nclass = y[0]'), Text(0.06585507185850686, 0.4861111111111111, 'node #482\navg_price_per_room <= 85.25\ngini = 0.5\nsamples = 6\nvalue = [2.982, 3.036]\nclass = y[1]'), Text(0.06518686344012486, 0.4583333333333333, 'node #483\narrival_year <= 2017.5\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.06485275923093388, 0.4305555555555556, 'node #484\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06552096764931586, 0.4305555555555556, 'node #485\ntotal_nights <= 3.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.06518686344012486, 0.4027777777777778, 'node #486\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06585507185850686, 0.4027777777777778, 'node #487\nlead_time <= 25.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.06552096764931586, 0.375, 'node #488\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06618917606769786, 0.375, 'node #489\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06652328027688886, 0.4583333333333333, 'node #490\narrival_date <= 6.0\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.06618917606769786, 0.4305555555555556, 'node #491\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06685738448607986, 0.4305555555555556, 'node #492\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.06652328027688886, 0.4861111111111111, 'node #493\ngini = 0.0\nsamples = 12\nvalue = [8.947, 0.0]\nclass = y[0]'), Text(0.06785969711365286, 0.5138888888888888, 'node #494\nno_of_adults <= 1.5\ngini = 0.039\nsamples = 101\nvalue = [74.555, 1.518]\nclass = y[0]'), Text(0.06752559290446186, 0.4861111111111111, 'node #495\navg_price_per_room <= 75.3\ngini = 0.161\nsamples = 22\nvalue = [15.657, 1.518]\nclass = y[0]'), Text(0.06719148869527086, 0.4583333333333333, 'node #496\ngini = -0.0\nsamples = 17\nvalue = [12.674, 0.0]\nclass = y[0]'), Text(0.06785969711365286, 0.4583333333333333, 'node #497\navg_price_per_room <= 76.3\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.06752559290446186, 0.4305555555555556, 'node #498\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06819380132284385, 0.4305555555555556, 'node #499\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.06819380132284385, 0.4861111111111111, 'node #500\ngini = 0.0\nsamples = 79\nvalue = [58.899, 0.0]\nclass = y[0]'), Text(0.06735854079986636, 0.5694444444444444, 'node #501\ngini = 0.0\nsamples = 286\nvalue = [213.228, 0.0]\nclass = y[0]'), Text(0.07103368710096733, 0.5972222222222222, 'node #502\nlead_time <= 66.5\ngini = 0.165\nsamples = 64\nvalue = [45.479, 4.554]\nclass = y[0]'), Text(0.07019842657798984, 0.5694444444444444, 'node #503\nlead_time <= 65.5\ngini = 0.287\nsamples = 32\nvalue = [21.621, 4.554]\nclass = y[0]'), Text(0.06953021815960785, 0.5416666666666666, 'node #504\navg_price_per_room <= 70.125\ngini = 0.155\nsamples = 23\nvalue = [16.402, 1.518]\nclass = y[0]'), Text(0.06919611395041685, 0.5138888888888888, 'node #505\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.06886200974122585, 0.4861111111111111, 'node #506\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.06953021815960785, 0.4861111111111111, 'node #507\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.06986432236879885, 0.5138888888888888, 'node #508\ngini = -0.0\nsamples = 17\nvalue = [12.674, 0.0]\nclass = y[0]'), Text(0.07086663499637184, 0.5416666666666666, 'node #509\nroom_type_reserved_Room_Type 3 <= 0.5\ngini = 0.465\nsamples = 9\nvalue = [5.219, 3.036]\nclass = y[0]'), Text(0.07053253078718084, 0.5138888888888888, 'node #510\narrival_year <= 2017.5\ngini = 0.482\nsamples = 8\nvalue = [4.473, 3.036]\nclass = y[0]'), Text(0.07019842657798984, 0.4861111111111111, 'node #511\ngini = 0.495\nsamples = 7\nvalue = [3.728, 3.036]\nclass = y[0]'), Text(0.07086663499637184, 0.4861111111111111, 'node #512\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07120073920556283, 0.5138888888888888, 'node #513\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07186894762394483, 0.5694444444444444, 'node #514\narrival_month <= 10.5\ngini = 0.0\nsamples = 32\nvalue = [23.858, 0.0]\nclass = y[0]'), Text(0.07153484341475383, 0.5416666666666666, 'node #515\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.07220305183313583, 0.5416666666666666, 'node #516\ngini = 0.0\nsamples = 27\nvalue = [20.13, 0.0]\nclass = y[0]'), Text(0.07462530734977056, 0.6527777777777778, 'node #517\nlead_time <= 27.5\ngini = 0.036\nsamples = 981\nvalue = [724.676, 13.663]\nclass = y[0]'), Text(0.07379004682679306, 0.625, 'node #518\narrival_year <= 2017.5\ngini = 0.093\nsamples = 325\nvalue = [236.34, 12.145]\nclass = y[0]'), Text(0.07312183840841108, 0.5972222222222222, 'node #519\navg_price_per_room <= 158.835\ngini = 0.023\nsamples = 174\nvalue = [128.98, 1.518]\nclass = y[0]'), Text(0.07278773419922008, 0.5694444444444444, 'node #520\ngini = 0.0\nsamples = 173\nvalue = [128.98, 0.0]\nclass = y[0]'), Text(0.07345594261760208, 0.5694444444444444, 'node #521\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07445825524517506, 0.5972222222222222, 'node #522\navg_price_per_room <= 118.575\ngini = 0.164\nsamples = 151\nvalue = [107.359, 10.627]\nclass = y[0]'), Text(0.07412415103598406, 0.5694444444444444, 'node #523\navg_price_per_room <= 117.075\ngini = 0.271\nsamples = 81\nvalue = [55.171, 10.627]\nclass = y[0]'), Text(0.07379004682679306, 0.5416666666666666, 'node #524\ntotal_nights <= 3.5\ngini = 0.213\nsamples = 79\nvalue = [55.171, 7.591]\nclass = y[0]'), Text(0.07220305183313583, 0.5138888888888888, 'node #525\narrival_date <= 4.5\ngini = 0.112\nsamples = 66\nvalue = [47.715, 3.036]\nclass = y[0]'), Text(0.07153484341475383, 0.4861111111111111, 'node #526\navg_price_per_room <= 100.0\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.07120073920556283, 0.4583333333333333, 'node #527\ngini = -0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.07186894762394483, 0.4583333333333333, 'node #528\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07287126025151783, 0.4861111111111111, 'node #529\narrival_date <= 25.5\ngini = 0.062\nsamples = 63\nvalue = [46.224, 1.518]\nclass = y[0]'), Text(0.07253715604232683, 0.4583333333333333, 'node #530\ngini = 0.0\nsamples = 52\nvalue = [38.769, 0.0]\nclass = y[0]'), Text(0.07320536446070883, 0.4583333333333333, 'node #531\nno_of_adults <= 1.5\ngini = 0.281\nsamples = 11\nvalue = [7.456, 1.518]\nclass = y[0]'), Text(0.07253715604232683, 0.4305555555555556, 'node #532\narrival_date <= 26.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.07220305183313583, 0.4027777777777778, 'node #533\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07287126025151783, 0.4027777777777778, 'node #534\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.07387357287909081, 0.4305555555555556, 'node #535\narrival_month <= 6.5\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.07353946866989983, 0.4027777777777778, 'node #536\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.07420767708828181, 0.4027777777777778, 'node #537\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.07537704182045031, 0.5138888888888888, 'node #538\navg_price_per_room <= 92.5\ngini = 0.471\nsamples = 13\nvalue = [7.456, 4.554]\nclass = y[0]'), Text(0.07504293761125931, 0.4861111111111111, 'node #539\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07571114602964131, 0.4861111111111111, 'node #540\nlead_time <= 1.0\ngini = 0.411\nsamples = 12\nvalue = [7.456, 3.036]\nclass = y[0]'), Text(0.07487588550666381, 0.4583333333333333, 'node #541\narrival_date <= 28.0\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.07454178129747281, 0.4305555555555556, 'node #542\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07520998971585481, 0.4305555555555556, 'node #543\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07654640655261881, 0.4583333333333333, 'node #544\nlead_time <= 13.5\ngini = 0.301\nsamples = 10\nvalue = [6.71, 1.518]\nclass = y[0]'), Text(0.07587819813423681, 0.4305555555555556, 'node #545\narrival_date <= 4.5\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.07554409392504581, 0.4027777777777778, 'node #546\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07621230234342781, 0.4027777777777778, 'node #547\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.0772146149710008, 0.4305555555555556, 'node #548\nlead_time <= 18.0\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.0768805107618098, 0.4027777777777778, 'node #549\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.0775487191801918, 0.4027777777777778, 'node #550\ngini = -0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.07445825524517506, 0.5416666666666666, 'node #551\ngini = -0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.07479235945436606, 0.5694444444444444, 'node #552\ngini = 0.0\nsamples = 70\nvalue = [52.189, 0.0]\nclass = y[0]'), Text(0.07546056787274806, 0.625, 'node #553\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.006\nsamples = 656\nvalue = [488.336, 1.518]\nclass = y[0]'), Text(0.07512646366355706, 0.5972222222222222, 'node #554\ngini = 0.0\nsamples = 629\nvalue = [468.952, 0.0]\nclass = y[0]'), Text(0.07579467208193906, 0.5972222222222222, 'node #555\nlead_time <= 69.5\ngini = 0.135\nsamples = 27\nvalue = [19.384, 1.518]\nclass = y[0]'), Text(0.07546056787274806, 0.5694444444444444, 'node #556\ngini = 0.0\nsamples = 22\nvalue = [16.402, 0.0]\nclass = y[0]'), Text(0.07612877629113006, 0.5694444444444444, 'node #557\ntype_of_meal_plan_Meal Plan 2 <= 0.5\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.07579467208193906, 0.5416666666666666, 'node #558\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07646288050032106, 0.5416666666666666, 'node #559\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.0797830410791566, 0.7916666666666666, 'node #560\nlead_time <= 78.5\ngini = 0.442\nsamples = 329\nvalue = [197.571, 97.159]\nclass = y[0]'), Text(0.07579467208193906, 0.7638888888888888, 'node #561\navg_price_per_room <= 79.78\ngini = 0.42\nsamples = 73\nvalue = [25.349, 59.207]\nclass = y[1]'), Text(0.07512646366355706, 0.7361111111111112, 'node #562\narrival_month <= 3.5\ngini = 0.168\nsamples = 21\nvalue = [14.911, 1.518]\nclass = y[0]'), Text(0.07479235945436606, 0.7083333333333334, 'node #563\nno_of_adults <= 1.5\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.07445825524517506, 0.6805555555555556, 'node #564\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07512646366355706, 0.6805555555555556, 'node #565\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.07546056787274806, 0.7083333333333334, 'node #566\ngini = -0.0\nsamples = 17\nvalue = [12.674, 0.0]\nclass = y[0]'), Text(0.07646288050032106, 0.7361111111111112, 'node #567\narrival_month <= 3.5\ngini = 0.259\nsamples = 52\nvalue = [10.438, 57.688]\nclass = y[1]'), Text(0.07612877629113006, 0.7083333333333334, 'node #568\ngini = 0.0\nsamples = 19\nvalue = [0.0, 28.844]\nclass = y[1]'), Text(0.07679698470951205, 0.7083333333333334, 'node #569\ntotal_nights <= 2.5\ngini = 0.39\nsamples = 33\nvalue = [10.438, 28.844]\nclass = y[1]'), Text(0.07612877629113006, 0.6805555555555556, 'node #570\navg_price_per_room <= 96.215\ngini = 0.134\nsamples = 22\nvalue = [2.237, 28.844]\nclass = y[1]'), Text(0.07579467208193906, 0.6527777777777778, 'node #571\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.07646288050032106, 0.6527777777777778, 'node #572\narrival_month <= 11.0\ngini = 0.049\nsamples = 20\nvalue = [0.746, 28.844]\nclass = y[1]'), Text(0.07612877629113006, 0.625, 'node #573\ngini = 0.0\nsamples = 19\nvalue = [0.0, 28.844]\nclass = y[1]'), Text(0.07679698470951205, 0.625, 'node #574\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07746519312789404, 0.6805555555555556, 'node #575\narrival_year <= 2017.5\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.07713108891870304, 0.6527777777777778, 'node #576\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07779929733708504, 0.6527777777777778, 'node #577\ngini = 0.0\nsamples = 10\nvalue = [7.456, 0.0]\nclass = y[0]'), Text(0.08377141007637413, 0.7638888888888888, 'node #578\ntotal_nights <= 3.5\ngini = 0.296\nsamples = 256\nvalue = [172.222, 37.953]\nclass = y[0]'), Text(0.08047213101061303, 0.7361111111111112, 'node #579\nmarket_segment_type_Corporate <= 0.5\ngini = 0.203\nsamples = 218\nvalue = [152.838, 19.736]\nclass = y[0]'), Text(0.07880160996465804, 0.7083333333333334, 'node #580\ntotal_nights <= 2.5\ngini = 0.137\nsamples = 186\nvalue = [133.454, 10.627]\nclass = y[0]'), Text(0.07846750575546704, 0.6805555555555556, 'node #581\ngini = -0.0\nsamples = 110\nvalue = [82.011, 0.0]\nclass = y[0]'), Text(0.07913571417384904, 0.6805555555555556, 'node #582\navg_price_per_room <= 98.45\ngini = 0.284\nsamples = 76\nvalue = [51.443, 10.627]\nclass = y[0]'), Text(0.07846750575546704, 0.6527777777777778, 'node #583\nlead_time <= 80.5\ngini = 0.064\nsamples = 60\nvalue = [43.988, 1.518]\nclass = y[0]'), Text(0.07813340154627604, 0.625, 'node #584\narrival_date <= 25.0\ngini = 0.349\nsamples = 8\nvalue = [5.219, 1.518]\nclass = y[0]'), Text(0.07779929733708504, 0.5972222222222222, 'node #585\ngini = -0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.07846750575546704, 0.5972222222222222, 'node #586\ntype_of_meal_plan_Meal Plan 2 <= 0.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.07813340154627604, 0.5694444444444444, 'node #587\narrival_date <= 26.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.07779929733708504, 0.5416666666666666, 'node #588\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.07846750575546704, 0.5416666666666666, 'node #589\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07880160996465804, 0.5694444444444444, 'node #590\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.07880160996465804, 0.625, 'node #591\ngini = -0.0\nsamples = 52\nvalue = [38.769, 0.0]\nclass = y[0]'), Text(0.07980392259223103, 0.6527777777777778, 'node #592\narrival_month <= 5.5\ngini = 0.495\nsamples = 16\nvalue = [7.456, 9.109]\nclass = y[1]'), Text(0.07946981838304004, 0.625, 'node #593\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.08013802680142203, 0.625, 'node #594\nlead_time <= 88.0\ngini = 0.411\nsamples = 12\nvalue = [7.456, 3.036]\nclass = y[0]'), Text(0.07980392259223103, 0.5972222222222222, 'node #595\narrival_year <= 2017.5\ngini = 0.0\nsamples = 10\nvalue = [7.456, 0.0]\nclass = y[0]'), Text(0.07946981838304004, 0.5694444444444444, 'node #596\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.08013802680142203, 0.5694444444444444, 'node #597\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.08047213101061303, 0.5972222222222222, 'node #598\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.08214265205656802, 0.7083333333333334, 'node #599\nlead_time <= 86.5\ngini = 0.435\nsamples = 32\nvalue = [19.384, 9.109]\nclass = y[0]'), Text(0.08147444363818603, 0.6805555555555556, 'node #600\navg_price_per_room <= 97.5\ngini = 0.332\nsamples = 26\nvalue = [17.148, 4.554]\nclass = y[0]'), Text(0.08114033942899503, 0.6527777777777778, 'node #601\ngini = 0.0\nsamples = 10\nvalue = [7.456, 0.0]\nclass = y[0]'), Text(0.08180854784737703, 0.6527777777777778, 'node #602\ntotal_nights <= 2.5\ngini = 0.435\nsamples = 16\nvalue = [9.692, 4.554]\nclass = y[0]'), Text(0.08147444363818603, 0.625, 'node #603\navg_price_per_room <= 114.865\ngini = 0.459\nsamples = 14\nvalue = [8.201, 4.554]\nclass = y[0]'), Text(0.08114033942899503, 0.5972222222222222, 'node #604\ngini = 0.471\nsamples = 13\nvalue = [7.456, 4.554]\nclass = y[0]'), Text(0.08180854784737703, 0.5972222222222222, 'node #605\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08214265205656802, 0.625, 'node #606\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08281086047495001, 0.6805555555555556, 'node #607\nlead_time <= 87.5\ngini = 0.442\nsamples = 6\nvalue = [2.237, 4.554]\nclass = y[1]'), Text(0.08247675626575901, 0.6527777777777778, 'node #608\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.08314496468414101, 0.6527777777777778, 'node #609\ntotal_nights <= 2.5\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.08281086047495001, 0.625, 'node #610\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08347906889333201, 0.625, 'node #611\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08707068914213524, 0.7361111111111112, 'node #612\narrival_date <= 24.5\ngini = 0.5\nsamples = 38\nvalue = [19.384, 18.217]\nclass = y[0]'), Text(0.08531664204388249, 0.7083333333333334, 'node #613\narrival_date <= 8.5\ngini = 0.386\nsamples = 27\nvalue = [17.148, 6.072]\nclass = y[0]'), Text(0.08414727731171401, 0.6805555555555556, 'node #614\nlead_time <= 84.0\ngini = 0.5\nsamples = 9\nvalue = [4.473, 4.554]\nclass = y[1]'), Text(0.08381317310252301, 0.6527777777777778, 'node #615\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.08448138152090501, 0.6527777777777778, 'node #616\navg_price_per_room <= 84.025\ngini = 0.372\nsamples = 5\nvalue = [1.491, 4.554]\nclass = y[1]'), Text(0.08414727731171401, 0.625, 'node #617\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.08481548573009601, 0.625, 'node #618\narrival_month <= 6.0\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08448138152090501, 0.5972222222222222, 'node #619\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08514958993928701, 0.5972222222222222, 'node #620\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08648600677605099, 0.6805555555555556, 'node #621\narrival_date <= 21.5\ngini = 0.191\nsamples = 18\nvalue = [12.674, 1.518]\nclass = y[0]'), Text(0.08581779835766899, 0.6527777777777778, 'node #622\nno_of_adults <= 2.5\ngini = 0.0\nsamples = 13\nvalue = [9.692, 0.0]\nclass = y[0]'), Text(0.08548369414847799, 0.625, 'node #623\ngini = 0.0\nsamples = 11\nvalue = [8.201, 0.0]\nclass = y[0]'), Text(0.08615190256685999, 0.625, 'node #624\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08715421519443299, 0.6527777777777778, 'node #625\navg_price_per_room <= 87.875\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.08682011098524199, 0.625, 'node #626\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.08748831940362399, 0.625, 'node #627\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.08882473624038797, 0.7083333333333334, 'node #628\narrival_month <= 3.5\ngini = 0.263\nsamples = 11\nvalue = [2.237, 12.145]\nclass = y[1]'), Text(0.08815652782200599, 0.6805555555555556, 'node #629\navg_price_per_room <= 67.25\ngini = 0.109\nsamples = 9\nvalue = [0.746, 12.145]\nclass = y[1]'), Text(0.08782242361281499, 0.6527777777777778, 'node #630\ngini = 0.0\nsamples = 7\nvalue = [0.0, 10.627]\nclass = y[1]'), Text(0.08849063203119698, 0.6527777777777778, 'node #631\nlead_time <= 80.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.08815652782200599, 0.625, 'node #632\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.08882473624038797, 0.625, 'node #633\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08949294465876997, 0.6805555555555556, 'node #634\ntype_of_meal_plan_Meal Plan 2 <= 0.5\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08915884044957897, 0.6527777777777778, 'node #635\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08982704886796097, 0.6527777777777778, 'node #636\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08443961849475613, 0.8194444444444444, 'node #637\narrival_date <= 28.0\ngini = 0.103\nsamples = 19\nvalue = [1.491, 25.808]\nclass = y[1]'), Text(0.08410551428556513, 0.7916666666666666, 'node #638\ngini = 0.0\nsamples = 17\nvalue = [0.0, 25.808]\nclass = y[1]'), Text(0.08477372270394713, 0.7916666666666666, 'node #639\navg_price_per_room <= 240.375\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.08443961849475613, 0.7638888888888888, 'node #640\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.08510782691313813, 0.7638888888888888, 'node #641\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09533976831961244, 0.8472222222222222, 'node #642\navg_price_per_room <= 92.8\ngini = 0.498\nsamples = 179\nvalue = [85.738, 97.159]\nclass = y[1]'), Text(0.09316809095987096, 0.8194444444444444, 'node #643\narrival_date <= 22.5\ngini = 0.256\nsamples = 100\nvalue = [68.591, 12.145]\nclass = y[0]'), Text(0.09283398675067996, 0.7916666666666666, 'node #644\nroom_type_reserved_Room_Type 5 <= 0.5\ngini = 0.336\nsamples = 68\nvalue = [44.733, 12.145]\nclass = y[0]'), Text(0.09249988254148896, 0.7638888888888888, 'node #645\nlead_time <= 72.5\ngini = 0.31\nsamples = 67\nvalue = [44.733, 10.627]\nclass = y[0]'), Text(0.09216577833229796, 0.7361111111111112, 'node #646\nlead_time <= 33.0\ngini = 0.387\nsamples = 47\nvalue = [29.822, 10.627]\nclass = y[0]'), Text(0.09082936149553397, 0.7083333333333334, 'node #647\narrival_date <= 16.0\ngini = 0.196\nsamples = 35\nvalue = [24.603, 3.036]\nclass = y[0]'), Text(0.09049525728634297, 0.6805555555555556, 'node #648\ngini = 0.0\nsamples = 25\nvalue = [18.639, 0.0]\nclass = y[0]'), Text(0.09116346570472496, 0.6805555555555556, 'node #649\narrival_month <= 8.5\ngini = 0.447\nsamples = 10\nvalue = [5.964, 3.036]\nclass = y[0]'), Text(0.09049525728634297, 0.6527777777777778, 'node #650\narrival_date <= 19.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.09016115307715197, 0.625, 'node #651\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09082936149553397, 0.625, 'node #652\nno_of_adults <= 1.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.09049525728634297, 0.5972222222222222, 'node #653\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09116346570472496, 0.5972222222222222, 'node #654\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09183167412310696, 0.6527777777777778, 'node #655\narrival_date <= 18.5\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.09149756991391596, 0.625, 'node #656\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.09216577833229796, 0.625, 'node #657\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.09350219516906196, 0.7083333333333334, 'node #658\narrival_month <= 6.5\ngini = 0.483\nsamples = 12\nvalue = [5.219, 7.591]\nclass = y[1]'), Text(0.09316809095987096, 0.6805555555555556, 'node #659\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.09383629937825295, 0.6805555555555556, 'node #660\nlead_time <= 47.5\ngini = 0.405\nsamples = 9\nvalue = [2.982, 7.591]\nclass = y[1]'), Text(0.09316809095987096, 0.6527777777777778, 'node #661\nlead_time <= 36.0\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.09283398675067996, 0.625, 'node #662\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09350219516906196, 0.625, 'node #663\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.09450450779663494, 0.6527777777777778, 'node #664\navg_price_per_room <= 82.5\ngini = 0.195\nsamples = 5\nvalue = [0.746, 6.072]\nclass = y[1]'), Text(0.09417040358744394, 0.625, 'node #665\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.09483861200582594, 0.625, 'node #666\narrival_date <= 5.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.09450450779663494, 0.5972222222222222, 'node #667\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09517271621501694, 0.5972222222222222, 'node #668\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09283398675067996, 0.7361111111111112, 'node #669\ngini = 0.0\nsamples = 20\nvalue = [14.911, 0.0]\nclass = y[0]'), Text(0.09316809095987096, 0.7638888888888888, 'node #670\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09350219516906196, 0.7916666666666666, 'node #671\ngini = 0.0\nsamples = 32\nvalue = [23.858, 0.0]\nclass = y[0]'), Text(0.09751144567935392, 0.8194444444444444, 'node #672\narrival_month <= 8.5\ngini = 0.279\nsamples = 79\nvalue = [17.148, 85.014]\nclass = y[1]'), Text(0.09717734147016292, 0.7916666666666666, 'node #673\narrival_date <= 21.0\ngini = 0.184\nsamples = 69\nvalue = [9.692, 85.014]\nclass = y[1]'), Text(0.09684323726097192, 0.7638888888888888, 'node #674\nno_of_adults <= 1.5\ngini = 0.095\nsamples = 62\nvalue = [4.473, 85.014]\nclass = y[1]'), Text(0.09617502884258994, 0.7361111111111112, 'node #675\narrival_month <= 4.5\ngini = 0.019\nsamples = 53\nvalue = [0.746, 78.942]\nclass = y[1]'), Text(0.09584092463339894, 0.7083333333333334, 'node #676\ntotal_nights <= 13.5\ngini = 0.242\nsamples = 4\nvalue = [0.746, 4.554]\nclass = y[1]'), Text(0.09550682042420794, 0.6805555555555556, 'node #677\nlead_time <= 3.0\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.09517271621501694, 0.6527777777777778, 'node #678\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09584092463339894, 0.6527777777777778, 'node #679\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09617502884258994, 0.6805555555555556, 'node #680\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.09650913305178094, 0.7083333333333334, 'node #681\ngini = 0.0\nsamples = 49\nvalue = [0.0, 74.388]\nclass = y[1]'), Text(0.09751144567935392, 0.7361111111111112, 'node #682\ntotal_nights <= 6.5\ngini = 0.471\nsamples = 9\nvalue = [3.728, 6.072]\nclass = y[1]'), Text(0.09717734147016292, 0.7083333333333334, 'node #683\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.09784554988854492, 0.7083333333333334, 'node #684\narrival_month <= 6.5\ngini = 0.393\nsamples = 7\nvalue = [2.237, 6.072]\nclass = y[1]'), Text(0.09751144567935392, 0.6805555555555556, 'node #685\nlead_time <= 83.0\ngini = 0.482\nsamples = 4\nvalue = [2.237, 1.518]\nclass = y[0]'), Text(0.09717734147016292, 0.6527777777777778, 'node #686\ngini = -0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.09784554988854492, 0.6527777777777778, 'node #687\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.09817965409773592, 0.6805555555555556, 'node #688\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.09751144567935392, 0.7638888888888888, 'node #689\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.09784554988854492, 0.7916666666666666, 'node #690\ngini = 0.0\nsamples = 10\nvalue = [7.456, 0.0]\nclass = y[0]'), Text(0.13054143153212883, 0.875, 'node #691\nlead_time <= 117.5\ngini = 0.5\nsamples = 1246\nvalue = [612.844, 643.681]\nclass = y[1]'), Text(0.11889019978387634, 0.8472222222222222, 'node #692\navg_price_per_room <= 93.575\ngini = 0.465\nsamples = 737\nvalue = [297.475, 513.123]\nclass = y[1]'), Text(0.10978194479972019, 0.8194444444444444, 'node #693\navg_price_per_room <= 75.07\ngini = 0.5\nsamples = 438\nvalue = [214.719, 227.717]\nclass = y[1]'), Text(0.10532113156919351, 0.7916666666666666, 'node #694\narrival_month <= 7.5\ngini = 0.446\nsamples = 227\nvalue = [85.738, 170.029]\nclass = y[1]'), Text(0.1020218525034324, 0.7638888888888888, 'node #695\navg_price_per_room <= 58.75\ngini = 0.287\nsamples = 140\nvalue = [31.313, 148.775]\nclass = y[1]'), Text(0.1016877482942414, 0.7361111111111112, 'node #696\ngini = 0.0\nsamples = 14\nvalue = [10.438, 0.0]\nclass = y[0]'), Text(0.1023559567126234, 0.7361111111111112, 'node #697\ntotal_nights <= 3.5\ngini = 0.216\nsamples = 126\nvalue = [20.875, 148.775]\nclass = y[1]'), Text(0.1005183835620729, 0.7083333333333334, 'node #698\nlead_time <= 104.5\ngini = 0.11\nsamples = 98\nvalue = [8.201, 132.076]\nclass = y[1]'), Text(0.09918196672530892, 0.6805555555555556, 'node #699\nlead_time <= 101.5\ngini = 0.497\nsamples = 11\nvalue = [5.219, 6.072]\nclass = y[1]'), Text(0.09851375830692692, 0.6527777777777778, 'node #700\navg_price_per_room <= 66.875\ngini = 0.393\nsamples = 7\nvalue = [2.237, 6.072]\nclass = y[1]'), Text(0.09817965409773592, 0.625, 'node #701\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09884786251611792, 0.625, 'node #702\narrival_month <= 2.5\ngini = 0.317\nsamples = 6\nvalue = [1.491, 6.072]\nclass = y[1]'), Text(0.09851375830692692, 0.5972222222222222, 'node #703\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.09918196672530892, 0.5972222222222222, 'node #704\navg_price_per_room <= 72.5\ngini = 0.195\nsamples = 5\nvalue = [0.746, 6.072]\nclass = y[1]'), Text(0.09884786251611792, 0.5694444444444444, 'node #705\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.09951607093449992, 0.5694444444444444, 'node #706\ntotal_nights <= 2.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.09918196672530892, 0.5416666666666666, 'node #707\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.0998501751436909, 0.5416666666666666, 'node #708\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.0998501751436909, 0.6527777777777778, 'node #709\nlead_time <= 102.5\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.09951607093449992, 0.625, 'node #710\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1001842793528819, 0.625, 'node #711\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.1018548003988369, 0.6805555555555556, 'node #712\nlead_time <= 116.5\ngini = 0.045\nsamples = 87\nvalue = [2.982, 126.004]\nclass = y[1]'), Text(0.1015206961896459, 0.6527777777777778, 'node #713\nlead_time <= 112.0\ngini = 0.034\nsamples = 86\nvalue = [2.237, 126.004]\nclass = y[1]'), Text(0.1008524877712639, 0.625, 'node #714\navg_price_per_room <= 73.625\ngini = 0.0\nsamples = 47\nvalue = [0.0, 71.351]\nclass = y[1]'), Text(0.1005183835620729, 0.5972222222222222, 'node #715\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.1011865919804549, 0.5972222222222222, 'node #716\ngini = 0.0\nsamples = 46\nvalue = [0.0, 69.833]\nclass = y[1]'), Text(0.1021889046080279, 0.625, 'node #717\ntotal_nights <= 2.5\ngini = 0.076\nsamples = 39\nvalue = [2.237, 54.652]\nclass = y[1]'), Text(0.1018548003988369, 0.5972222222222222, 'node #718\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1025230088172189, 0.5972222222222222, 'node #719\navg_price_per_room <= 73.625\ngini = 0.052\nsamples = 38\nvalue = [1.491, 54.652]\nclass = y[1]'), Text(0.1021889046080279, 0.5694444444444444, 'node #720\ngini = 0.0\nsamples = 25\nvalue = [0.0, 37.953]\nclass = y[1]'), Text(0.10285711302640989, 0.5694444444444444, 'node #721\narrival_month <= 4.5\ngini = 0.151\nsamples = 13\nvalue = [1.491, 16.699]\nclass = y[1]'), Text(0.1025230088172189, 0.5416666666666666, 'node #722\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.10319121723560089, 0.5416666666666666, 'node #723\ngini = -0.0\nsamples = 11\nvalue = [0.0, 16.699]\nclass = y[1]'), Text(0.1021889046080279, 0.6527777777777778, 'node #724\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.10419352986317389, 0.7083333333333334, 'node #725\narrival_date <= 23.5\ngini = 0.491\nsamples = 28\nvalue = [12.674, 16.699]\nclass = y[1]'), Text(0.10352532144479189, 0.6805555555555556, 'node #726\nlead_time <= 103.5\ngini = 0.234\nsamples = 14\nvalue = [9.692, 1.518]\nclass = y[0]'), Text(0.10319121723560089, 0.6527777777777778, 'node #727\narrival_month <= 6.0\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.10285711302640989, 0.625, 'node #728\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.10352532144479189, 0.625, 'node #729\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.10385942565398289, 0.6527777777777778, 'node #730\ngini = 0.0\nsamples = 9\nvalue = [6.71, 0.0]\nclass = y[0]'), Text(0.10486173828155589, 0.6805555555555556, 'node #731\navg_price_per_room <= 73.625\ngini = 0.274\nsamples = 14\nvalue = [2.982, 15.181]\nclass = y[1]'), Text(0.10452763407236489, 0.6527777777777778, 'node #732\narrival_month <= 1.5\ngini = 0.089\nsamples = 11\nvalue = [0.746, 15.181]\nclass = y[1]'), Text(0.10419352986317389, 0.625, 'node #733\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.10486173828155589, 0.625, 'node #734\ngini = 0.0\nsamples = 10\nvalue = [0.0, 15.181]\nclass = y[1]'), Text(0.10519584249074689, 0.6527777777777778, 'node #735\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.1086204106349546, 0.7638888888888888, 'node #736\narrival_date <= 29.5\ngini = 0.404\nsamples = 87\nvalue = [54.425, 21.254]\nclass = y[0]'), Text(0.10770162405967937, 0.7361111111111112, 'node #737\ntype_of_meal_plan_Not Selected <= 0.5\ngini = 0.232\nsamples = 71\nvalue = [49.206, 7.591]\nclass = y[0]'), Text(0.10686636353670187, 0.7083333333333334, 'node #738\navg_price_per_room <= 71.125\ngini = 0.161\nsamples = 66\nvalue = [46.97, 4.554]\nclass = y[0]'), Text(0.10619815511831987, 0.6805555555555556, 'node #739\narrival_month <= 9.5\ngini = 0.0\nsamples = 43\nvalue = [32.059, 0.0]\nclass = y[0]'), Text(0.10586405090912887, 0.6527777777777778, 'node #740\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.10653225932751087, 0.6527777777777778, 'node #741\ngini = 0.0\nsamples = 36\nvalue = [26.84, 0.0]\nclass = y[0]'), Text(0.10753457195508387, 0.6805555555555556, 'node #742\ntotal_nights <= 4.5\ngini = 0.358\nsamples = 23\nvalue = [14.911, 4.554]\nclass = y[0]'), Text(0.10720046774589287, 0.6527777777777778, 'node #743\narrival_year <= 2017.5\ngini = 0.281\nsamples = 22\nvalue = [14.911, 3.036]\nclass = y[0]'), Text(0.10686636353670187, 0.625, 'node #744\ngini = 0.0\nsamples = 10\nvalue = [7.456, 0.0]\nclass = y[0]'), Text(0.10753457195508387, 0.625, 'node #745\narrival_month <= 9.0\ngini = 0.411\nsamples = 12\nvalue = [7.456, 3.036]\nclass = y[0]'), Text(0.10720046774589287, 0.5972222222222222, 'node #746\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.10786867616427487, 0.5972222222222222, 'node #747\ngini = 0.495\nsamples = 7\nvalue = [3.728, 3.036]\nclass = y[0]'), Text(0.10786867616427487, 0.6527777777777778, 'node #748\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.10853688458265685, 0.7083333333333334, 'node #749\narrival_date <= 6.5\ngini = 0.489\nsamples = 5\nvalue = [2.237, 3.036]\nclass = y[1]'), Text(0.10820278037346587, 0.6805555555555556, 'node #750\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.10887098879184785, 0.6805555555555556, 'node #751\nlead_time <= 94.0\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.10853688458265685, 0.6527777777777778, 'node #752\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.10920509300103885, 0.6527777777777778, 'node #753\navg_price_per_room <= 61.625\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.10887098879184785, 0.625, 'node #754\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.10953919721022985, 0.625, 'node #755\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.10953919721022985, 0.7361111111111112, 'node #756\nlead_time <= 98.0\ngini = 0.4\nsamples = 16\nvalue = [5.219, 13.663]\nclass = y[1]'), Text(0.10920509300103885, 0.7083333333333334, 'node #757\ngini = 0.0\nsamples = 6\nvalue = [4.473, 0.0]\nclass = y[0]'), Text(0.10987330141942085, 0.7083333333333334, 'node #758\navg_price_per_room <= 63.25\ngini = 0.098\nsamples = 10\nvalue = [0.746, 13.663]\nclass = y[1]'), Text(0.10953919721022985, 0.6805555555555556, 'node #759\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11020740562861185, 0.6805555555555556, 'node #760\ngini = 0.0\nsamples = 9\nvalue = [0.0, 13.663]\nclass = y[1]'), Text(0.11424275803024687, 0.7916666666666666, 'node #761\narrival_month <= 3.5\ngini = 0.427\nsamples = 211\nvalue = [128.98, 57.688]\nclass = y[0]'), Text(0.11221203088375784, 0.7638888888888888, 'node #762\navg_price_per_room <= 88.5\ngini = 0.092\nsamples = 82\nvalue = [59.644, 3.036]\nclass = y[0]'), Text(0.11154382246537584, 0.7361111111111112, 'node #763\ntotal_nights <= 1.5\ngini = 0.049\nsamples = 80\nvalue = [58.899, 1.518]\nclass = y[0]'), Text(0.11120971825618484, 0.7083333333333334, 'node #764\navg_price_per_room <= 80.5\ngini = 0.123\nsamples = 30\nvalue = [21.621, 1.518]\nclass = y[0]'), Text(0.11087561404699385, 0.6805555555555556, 'node #765\nno_of_adults <= 1.5\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.11054150983780285, 0.6527777777777778, 'node #766\ngini = 0.411\nsamples = 6\nvalue = [3.728, 1.518]\nclass = y[0]'), Text(0.11120971825618484, 0.6527777777777778, 'node #767\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11154382246537584, 0.6805555555555556, 'node #768\ngini = 0.0\nsamples = 23\nvalue = [17.148, 0.0]\nclass = y[0]'), Text(0.11187792667456684, 0.7083333333333334, 'node #769\ngini = 0.0\nsamples = 50\nvalue = [37.278, 0.0]\nclass = y[0]'), Text(0.11288023930213983, 0.7361111111111112, 'node #770\nmarket_segment_type_Offline <= 0.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.11254613509294883, 0.7083333333333334, 'node #771\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.11321434351133083, 0.7083333333333334, 'node #772\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1162734851767359, 0.7638888888888888, 'node #773\narrival_month <= 4.5\ngini = 0.493\nsamples = 129\nvalue = [69.336, 54.652]\nclass = y[0]'), Text(0.11421665613890382, 0.7361111111111112, 'node #774\navg_price_per_room <= 80.375\ngini = 0.151\nsamples = 13\nvalue = [1.491, 16.699]\nclass = y[1]'), Text(0.11388255192971283, 0.7083333333333334, 'node #775\ngini = -0.0\nsamples = 11\nvalue = [0.0, 16.699]\nclass = y[1]'), Text(0.11455076034809482, 0.7083333333333334, 'node #776\ntotal_nights <= 4.5\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.11421665613890382, 0.6805555555555556, 'node #777\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11488486455728582, 0.6805555555555556, 'node #778\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.118330314214568, 0.7361111111111112, 'node #779\nno_of_adults <= 1.5\ngini = 0.46\nsamples = 116\nvalue = [67.845, 37.953]\nclass = y[0]'), Text(0.11622128139404982, 0.7083333333333334, 'node #780\navg_price_per_room <= 86.0\ngini = 0.462\nsamples = 28\nvalue = [11.183, 19.736]\nclass = y[1]'), Text(0.11555307297566782, 0.6805555555555556, 'node #781\narrival_month <= 8.0\ngini = 0.208\nsamples = 14\nvalue = [2.237, 16.699]\nclass = y[1]'), Text(0.11521896876647682, 0.6527777777777778, 'node #782\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11588717718485882, 0.6527777777777778, 'node #783\navg_price_per_room <= 77.07\ngini = 0.151\nsamples = 13\nvalue = [1.491, 16.699]\nclass = y[1]'), Text(0.11555307297566782, 0.625, 'node #784\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11622128139404982, 0.625, 'node #785\nlead_time <= 101.5\ngini = 0.082\nsamples = 12\nvalue = [0.746, 16.699]\nclass = y[1]'), Text(0.11588717718485882, 0.5972222222222222, 'node #786\narrival_month <= 9.5\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.11555307297566782, 0.5694444444444444, 'node #787\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.11622128139404982, 0.5694444444444444, 'node #788\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.11655538560324082, 0.5972222222222222, 'node #789\ngini = 0.0\nsamples = 10\nvalue = [0.0, 15.181]\nclass = y[1]'), Text(0.11688948981243182, 0.6805555555555556, 'node #790\narrival_date <= 9.0\ngini = 0.378\nsamples = 14\nvalue = [8.947, 3.036]\nclass = y[0]'), Text(0.11655538560324082, 0.6527777777777778, 'node #791\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1172235940216228, 0.6527777777777778, 'node #792\ngini = 0.394\nsamples = 13\nvalue = [8.201, 3.036]\nclass = y[0]'), Text(0.12043934703508616, 0.7083333333333334, 'node #793\narrival_date <= 22.5\ngini = 0.368\nsamples = 88\nvalue = [56.662, 18.217]\nclass = y[0]'), Text(0.11931174532906655, 0.6805555555555556, 'node #794\nno_of_adults <= 2.5\ngini = 0.168\nsamples = 63\nvalue = [44.733, 4.554]\nclass = y[0]'), Text(0.1183929587537913, 0.6527777777777778, 'node #795\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.121\nsamples = 61\nvalue = [43.988, 3.036]\nclass = y[0]'), Text(0.1175576982308138, 0.625, 'node #796\narrival_month <= 5.5\ngini = 0.0\nsamples = 50\nvalue = [37.278, 0.0]\nclass = y[0]'), Text(0.1172235940216228, 0.5972222222222222, 'node #797\ngini = 0.0\nsamples = 22\nvalue = [16.402, 0.0]\nclass = y[0]'), Text(0.1178918024400048, 0.5972222222222222, 'node #798\ngini = 0.0\nsamples = 28\nvalue = [20.875, 0.0]\nclass = y[0]'), Text(0.1192282192767688, 0.625, 'node #799\narrival_date <= 6.5\ngini = 0.429\nsamples = 11\nvalue = [6.71, 3.036]\nclass = y[0]'), Text(0.1185600108583868, 0.5972222222222222, 'node #800\nlead_time <= 96.5\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.1182259066491958, 0.5694444444444444, 'node #801\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.1188941150675778, 0.5694444444444444, 'node #802\ngini = -0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.11989642769515078, 0.5972222222222222, 'node #803\ntotal_nights <= 2.5\ngini = 0.0\nsamples = 7\nvalue = [5.219, 0.0]\nclass = y[0]'), Text(0.1195623234859598, 0.5694444444444444, 'node #804\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.12023053190434178, 0.5694444444444444, 'node #805\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.12023053190434178, 0.6527777777777778, 'node #806\navg_price_per_room <= 89.505\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.11989642769515078, 0.625, 'node #807\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.12056463611353278, 0.625, 'node #808\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.12156694874110578, 0.6805555555555556, 'node #809\nlead_time <= 96.5\ngini = 0.498\nsamples = 25\nvalue = [11.929, 13.663]\nclass = y[1]'), Text(0.12123284453191478, 0.6527777777777778, 'node #810\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.12190105295029678, 0.6527777777777778, 'node #811\navg_price_per_room <= 87.375\ngini = 0.423\nsamples = 17\nvalue = [5.964, 13.663]\nclass = y[1]'), Text(0.12123284453191478, 0.625, 'node #812\narrival_month <= 6.5\ngini = 0.294\nsamples = 13\nvalue = [2.982, 13.663]\nclass = y[1]'), Text(0.12089874032272378, 0.5972222222222222, 'node #813\ngini = 0.0\nsamples = 6\nvalue = [0.0, 9.109]\nclass = y[1]'), Text(0.12156694874110578, 0.5972222222222222, 'node #814\narrival_month <= 9.5\ngini = 0.478\nsamples = 7\nvalue = [2.982, 4.554]\nclass = y[1]'), Text(0.12123284453191478, 0.5694444444444444, 'node #815\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.12190105295029678, 0.5694444444444444, 'node #816\ntotal_nights <= 3.5\ngini = 0.242\nsamples = 4\nvalue = [0.746, 4.554]\nclass = y[1]'), Text(0.12156694874110578, 0.5416666666666666, 'node #817\narrival_date <= 26.5\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.12123284453191478, 0.5138888888888888, 'node #818\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.12190105295029678, 0.5138888888888888, 'node #819\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.12223515715948778, 0.5416666666666666, 'node #820\ngini = -0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.12256926136867878, 0.625, 'node #821\narrival_date <= 23.5\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.12223515715948778, 0.5972222222222222, 'node #822\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.12290336557786977, 0.5972222222222222, 'node #823\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.12799845476803248, 0.8194444444444444, 'node #824\narrival_date <= 11.5\ngini = 0.349\nsamples = 299\nvalue = [82.756, 285.406]\nclass = y[1]'), Text(0.12490799083301576, 0.7916666666666666, 'node #825\narrival_month <= 7.5\ngini = 0.494\nsamples = 79\nvalue = [36.532, 45.543]\nclass = y[1]'), Text(0.12357157399625177, 0.7638888888888888, 'node #826\nlead_time <= 108.5\ngini = 0.469\nsamples = 44\nvalue = [25.349, 15.181]\nclass = y[0]'), Text(0.12290336557786977, 0.7361111111111112, 'node #827\nno_of_adults <= 1.5\ngini = 0.499\nsamples = 26\nvalue = [12.674, 13.663]\nclass = y[1]'), Text(0.12256926136867878, 0.7083333333333334, 'node #828\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.12323746978706077, 0.7083333333333334, 'node #829\nlead_time <= 102.0\ngini = 0.486\nsamples = 22\nvalue = [9.692, 13.663]\nclass = y[1]'), Text(0.12290336557786977, 0.6805555555555556, 'node #830\ntotal_nights <= 3.5\ngini = 0.5\nsamples = 19\nvalue = [9.692, 9.109]\nclass = y[0]'), Text(0.12256926136867878, 0.6527777777777778, 'node #831\ngini = 0.499\nsamples = 17\nvalue = [8.201, 9.109]\nclass = y[1]'), Text(0.12323746978706077, 0.6527777777777778, 'node #832\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.12357157399625177, 0.6805555555555556, 'node #833\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.12423978241463376, 0.7361111111111112, 'node #834\ntotal_nights <= 2.5\ngini = 0.191\nsamples = 18\nvalue = [12.674, 1.518]\nclass = y[0]'), Text(0.12390567820544277, 0.7083333333333334, 'node #835\ngini = 0.248\nsamples = 13\nvalue = [8.947, 1.518]\nclass = y[0]'), Text(0.12457388662382476, 0.7083333333333334, 'node #836\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.12624440766977976, 0.7638888888888888, 'node #837\nlead_time <= 110.5\ngini = 0.393\nsamples = 35\nvalue = [11.183, 30.362]\nclass = y[1]'), Text(0.12557619925139776, 0.7361111111111112, 'node #838\navg_price_per_room <= 116.75\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.12524209504220676, 0.7083333333333334, 'node #839\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.12591030346058876, 0.7083333333333334, 'node #840\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.12691261608816176, 0.7361111111111112, 'node #841\ntotal_nights <= 2.0\ngini = 0.306\nsamples = 28\nvalue = [6.71, 28.844]\nclass = y[1]'), Text(0.12657851187897076, 0.7083333333333334, 'node #842\ngini = 0.0\nsamples = 8\nvalue = [0.0, 12.145]\nclass = y[1]'), Text(0.12724672029735273, 0.7083333333333334, 'node #843\nlead_time <= 112.0\ngini = 0.409\nsamples = 20\nvalue = [6.71, 16.699]\nclass = y[1]'), Text(0.12691261608816176, 0.6805555555555556, 'node #844\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.12758082450654373, 0.6805555555555556, 'node #845\navg_price_per_room <= 128.65\ngini = 0.363\nsamples = 18\nvalue = [5.219, 16.699]\nclass = y[1]'), Text(0.12724672029735273, 0.6527777777777778, 'node #846\narrival_year <= 2017.5\ngini = 0.333\nsamples = 17\nvalue = [4.473, 16.699]\nclass = y[1]'), Text(0.12691261608816176, 0.625, 'node #847\ngini = 0.393\nsamples = 14\nvalue = [4.473, 12.145]\nclass = y[1]'), Text(0.12758082450654373, 0.625, 'node #848\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.12791492871573473, 0.6527777777777778, 'node #849\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.13108891870304923, 0.7916666666666666, 'node #850\navg_price_per_room <= 102.09\ngini = 0.271\nsamples = 220\nvalue = [46.224, 239.862]\nclass = y[1]'), Text(0.12891724134330773, 0.7638888888888888, 'node #851\narrival_date <= 14.5\ngini = 0.067\nsamples = 102\nvalue = [5.219, 144.221]\nclass = y[1]'), Text(0.12858313713411673, 0.7361111111111112, 'node #852\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.12925134555249873, 0.7361111111111112, 'node #853\narrival_month <= 2.5\ngini = 0.049\nsamples = 100\nvalue = [3.728, 144.221]\nclass = y[1]'), Text(0.12891724134330773, 0.7083333333333334, 'node #854\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.12958544976168973, 0.7083333333333334, 'node #855\navg_price_per_room <= 95.44\ngini = 0.04\nsamples = 99\nvalue = [2.982, 144.221]\nclass = y[1]'), Text(0.12891724134330773, 0.6805555555555556, 'node #856\narrival_month <= 6.0\ngini = 0.163\nsamples = 24\nvalue = [2.982, 30.362]\nclass = y[1]'), Text(0.12858313713411673, 0.6527777777777778, 'node #857\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.12925134555249873, 0.6527777777777778, 'node #858\nlead_time <= 106.5\ngini = 0.128\nsamples = 23\nvalue = [2.237, 30.362]\nclass = y[1]'), Text(0.12891724134330773, 0.625, 'node #859\narrival_year <= 2017.5\ngini = 0.082\nsamples = 12\nvalue = [0.746, 16.699]\nclass = y[1]'), Text(0.12858313713411673, 0.5972222222222222, 'node #860\ngini = 0.089\nsamples = 11\nvalue = [0.746, 15.181]\nclass = y[1]'), Text(0.12925134555249873, 0.5972222222222222, 'node #861\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.12958544976168973, 0.625, 'node #862\ngini = 0.177\nsamples = 11\nvalue = [1.491, 13.663]\nclass = y[1]'), Text(0.13025365818007173, 0.6805555555555556, 'node #863\ntotal_nights <= 2.5\ngini = 0.0\nsamples = 75\nvalue = [0.0, 113.859]\nclass = y[1]'), Text(0.12991955397088073, 0.6527777777777778, 'node #864\ngini = 0.0\nsamples = 38\nvalue = [0.0, 57.688]\nclass = y[1]'), Text(0.13058776238926273, 0.6527777777777778, 'node #865\ngini = 0.0\nsamples = 37\nvalue = [0.0, 56.17]\nclass = y[1]'), Text(0.1332605960627907, 0.7638888888888888, 'node #866\navg_price_per_room <= 109.5\ngini = 0.42\nsamples = 118\nvalue = [41.005, 95.641]\nclass = y[1]'), Text(0.13192417922602673, 0.7361111111111112, 'node #867\ntotal_nights <= 1.5\ngini = 0.44\nsamples = 57\nvalue = [34.295, 16.699]\nclass = y[0]'), Text(0.13125597080764473, 0.7083333333333334, 'node #868\navg_price_per_room <= 108.5\ngini = 0.082\nsamples = 12\nvalue = [0.746, 16.699]\nclass = y[1]'), Text(0.13092186659845373, 0.6805555555555556, 'node #869\ngini = 0.0\nsamples = 11\nvalue = [0.0, 16.699]\nclass = y[1]'), Text(0.13159007501683573, 0.6805555555555556, 'node #870\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.13259238764440873, 0.7083333333333334, 'node #871\narrival_month <= 6.0\ngini = 0.0\nsamples = 45\nvalue = [33.55, 0.0]\nclass = y[0]'), Text(0.13225828343521773, 0.6805555555555556, 'node #872\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.1329264918535997, 0.6805555555555556, 'node #873\ngini = 0.0\nsamples = 42\nvalue = [31.313, 0.0]\nclass = y[0]'), Text(0.1345970128995547, 0.7361111111111112, 'node #874\navg_price_per_room <= 124.25\ngini = 0.144\nsamples = 61\nvalue = [6.71, 78.942]\nclass = y[1]'), Text(0.1339288044811727, 0.7083333333333334, 'node #875\narrival_date <= 19.5\ngini = 0.073\nsamples = 54\nvalue = [2.982, 75.906]\nclass = y[1]'), Text(0.1335947002719817, 0.6805555555555556, 'node #876\ngini = 0.0\nsamples = 47\nvalue = [0.0, 71.351]\nclass = y[1]'), Text(0.1342629086903637, 0.6805555555555556, 'node #877\navg_price_per_room <= 114.58\ngini = 0.478\nsamples = 7\nvalue = [2.982, 4.554]\nclass = y[1]'), Text(0.1339288044811727, 0.6527777777777778, 'node #878\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.1345970128995547, 0.6527777777777778, 'node #879\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.1352652213179367, 0.7083333333333334, 'node #880\narrival_date <= 27.5\ngini = 0.495\nsamples = 7\nvalue = [3.728, 3.036]\nclass = y[0]'), Text(0.1349311171087457, 0.6805555555555556, 'node #881\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.1355993255271277, 0.6805555555555556, 'node #882\ntotal_nights <= 3.5\ngini = 0.442\nsamples = 4\nvalue = [1.491, 3.036]\nclass = y[1]'), Text(0.1352652213179367, 0.6527777777777778, 'node #883\narrival_year <= 2017.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.1349311171087457, 0.625, 'node #884\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.1355993255271277, 0.625, 'node #885\navg_price_per_room <= 177.835\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.1352652213179367, 0.5972222222222222, 'node #886\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1359334297363187, 0.5972222222222222, 'node #887\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.1359334297363187, 0.6527777777777778, 'node #888\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1421926632803813, 0.8472222222222222, 'node #889\nno_of_adults <= 1.5\ngini = 0.414\nsamples = 509\nvalue = [315.368, 130.558]\nclass = y[0]'), Text(0.1415244548619993, 0.8194444444444444, 'node #890\navg_price_per_room <= 122.0\ngini = 0.055\nsamples = 143\nvalue = [105.123, 3.036]\nclass = y[0]'), Text(0.1411903506528083, 0.7916666666666666, 'node #891\ngini = 0.0\nsamples = 141\nvalue = [105.123, 0.0]\nclass = y[0]'), Text(0.1418585590711903, 0.7916666666666666, 'node #892\ngini = -0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.1428608716987633, 0.8194444444444444, 'node #893\narrival_month <= 11.5\ngini = 0.47\nsamples = 366\nvalue = [210.246, 127.522]\nclass = y[0]'), Text(0.1425267674895723, 0.7916666666666666, 'node #894\narrival_date <= 7.5\ngini = 0.493\nsamples = 301\nvalue = [161.785, 127.522]\nclass = y[0]'), Text(0.1376039507822737, 0.7638888888888888, 'node #895\nlead_time <= 150.5\ngini = 0.177\nsamples = 59\nvalue = [41.751, 4.554]\nclass = y[0]'), Text(0.1372698465730827, 0.7361111111111112, 'node #896\narrival_month <= 5.0\ngini = 0.126\nsamples = 58\nvalue = [41.751, 3.036]\nclass = y[0]'), Text(0.1369357423638917, 0.7083333333333334, 'node #897\ngini = 0.0\nsamples = 33\nvalue = [24.603, 0.0]\nclass = y[0]'), Text(0.1376039507822737, 0.7083333333333334, 'node #898\narrival_month <= 6.5\ngini = 0.256\nsamples = 25\nvalue = [17.148, 3.036]\nclass = y[0]'), Text(0.1372698465730827, 0.6805555555555556, 'node #899\navg_price_per_room <= 107.5\ngini = 0.5\nsamples = 6\nvalue = [2.982, 3.036]\nclass = y[1]'), Text(0.1366016381547007, 0.6527777777777778, 'node #900\nlead_time <= 133.5\ngini = 0.317\nsamples = 3\nvalue = [0.746, 3.036]\nclass = y[1]'), Text(0.1362675339455097, 0.625, 'node #901\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1369357423638917, 0.625, 'node #902\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.1379380549914647, 0.6527777777777778, 'node #903\nlead_time <= 130.0\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.1376039507822737, 0.625, 'node #904\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1382721592006557, 0.625, 'node #905\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.1379380549914647, 0.6805555555555556, 'node #906\ngini = 0.0\nsamples = 19\nvalue = [14.165, 0.0]\nclass = y[0]'), Text(0.1379380549914647, 0.7361111111111112, 'node #907\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.1474495841968709, 0.7638888888888888, 'node #908\narrival_date <= 24.5\ngini = 0.5\nsamples = 242\nvalue = [120.034, 122.967]\nclass = y[1]'), Text(0.1456015702897832, 0.7361111111111112, 'node #909\ntotal_nights <= 3.5\ngini = 0.485\nsamples = 182\nvalue = [79.774, 113.859]\nclass = y[1]'), Text(0.14391016773075377, 0.7083333333333334, 'node #910\narrival_date <= 23.5\ngini = 0.448\nsamples = 141\nvalue = [53.68, 104.75]\nclass = y[1]'), Text(0.1421978836586499, 0.6805555555555556, 'node #911\navg_price_per_room <= 94.25\ngini = 0.484\nsamples = 121\nvalue = [52.934, 75.906]\nclass = y[1]'), Text(0.13977562814201516, 0.6527777777777778, 'node #912\navg_price_per_room <= 67.375\ngini = 0.477\nsamples = 62\nvalue = [35.041, 22.772]\nclass = y[0]'), Text(0.13894036761903766, 0.625, 'node #913\navg_price_per_room <= 52.125\ngini = 0.242\nsamples = 4\nvalue = [0.746, 4.554]\nclass = y[1]'), Text(0.13860626340984666, 0.5972222222222222, 'node #914\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.13927447182822866, 0.5972222222222222, 'node #915\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.14061088866499266, 0.625, 'node #916\navg_price_per_room <= 81.6\ngini = 0.453\nsamples = 58\nvalue = [34.295, 18.217]\nclass = y[0]'), Text(0.13994268024661066, 0.5972222222222222, 'node #917\narrival_month <= 3.5\ngini = 0.21\nsamples = 16\nvalue = [11.183, 1.518]\nclass = y[0]'), Text(0.13960857603741966, 0.5694444444444444, 'node #918\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.14027678445580166, 0.5694444444444444, 'node #919\ngini = 0.0\nsamples = 15\nvalue = [11.183, 0.0]\nclass = y[0]'), Text(0.14127909708337466, 0.5972222222222222, 'node #920\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.487\nsamples = 42\nvalue = [23.112, 16.699]\nclass = y[0]'), Text(0.14094499287418366, 0.5694444444444444, 'node #921\navg_price_per_room <= 86.43\ngini = 0.479\nsamples = 41\nvalue = [23.112, 15.181]\nclass = y[0]'), Text(0.14027678445580166, 0.5416666666666666, 'node #922\nlead_time <= 129.5\ngini = 0.5\nsamples = 19\nvalue = [9.692, 9.109]\nclass = y[0]'), Text(0.13994268024661066, 0.5138888888888888, 'node #923\narrival_month <= 5.0\ngini = 0.493\nsamples = 18\nvalue = [9.692, 7.591]\nclass = y[0]'), Text(0.13960857603741966, 0.4861111111111111, 'node #924\ntotal_nights <= 1.5\ngini = 0.499\nsamples = 16\nvalue = [8.201, 7.591]\nclass = y[0]'), Text(0.13927447182822866, 0.4583333333333333, 'node #925\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.13994268024661066, 0.4583333333333333, 'node #926\ngini = 0.5\nsamples = 15\nvalue = [7.456, 7.591]\nclass = y[1]'), Text(0.14027678445580166, 0.4861111111111111, 'node #927\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.14061088866499266, 0.5138888888888888, 'node #928\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.14161320129256566, 0.5416666666666666, 'node #929\narrival_year <= 2017.5\ngini = 0.429\nsamples = 22\nvalue = [13.42, 6.072]\nclass = y[0]'), Text(0.14127909708337466, 0.5138888888888888, 'node #930\ntotal_nights <= 2.0\ngini = 0.456\nsamples = 19\nvalue = [11.183, 6.072]\nclass = y[0]'), Text(0.14094499287418366, 0.4861111111111111, 'node #931\ngini = 0.447\nsamples = 10\nvalue = [5.964, 3.036]\nclass = y[0]'), Text(0.14161320129256566, 0.4861111111111111, 'node #932\ngini = 0.465\nsamples = 9\nvalue = [5.219, 3.036]\nclass = y[0]'), Text(0.14194730550175666, 0.5138888888888888, 'node #933\ngini = 0.0\nsamples = 3\nvalue = [2.237, 0.0]\nclass = y[0]'), Text(0.14161320129256566, 0.5694444444444444, 'node #934\ngini = -0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.14462013917528463, 0.6527777777777778, 'node #935\nroom_type_reserved_Room_Type 5 <= 0.5\ngini = 0.377\nsamples = 59\nvalue = [17.893, 53.134]\nclass = y[1]'), Text(0.14428603496609366, 0.625, 'node #936\navg_price_per_room <= 130.54\ngini = 0.36\nsamples = 57\nvalue = [16.402, 53.134]\nclass = y[1]'), Text(0.14395193075690266, 0.5972222222222222, 'node #937\nroom_type_reserved_Room_Type 4 <= 0.5\ngini = 0.352\nsamples = 56\nvalue = [15.657, 53.134]\nclass = y[1]'), Text(0.14361782654771166, 0.5694444444444444, 'node #938\nno_of_adults <= 2.5\ngini = 0.342\nsamples = 55\nvalue = [14.911, 53.134]\nclass = y[1]'), Text(0.14294961812932966, 0.5416666666666666, 'node #939\navg_price_per_room <= 95.25\ngini = 0.328\nsamples = 52\nvalue = [13.42, 51.616]\nclass = y[1]'), Text(0.14261551392013866, 0.5138888888888888, 'node #940\narrival_date <= 11.5\ngini = 0.352\nsamples = 48\nvalue = [13.42, 45.543]\nclass = y[1]'), Text(0.14228140971094766, 0.4861111111111111, 'node #941\ngini = 0.257\nsamples = 15\nvalue = [2.982, 16.699]\nclass = y[1]'), Text(0.14294961812932966, 0.4861111111111111, 'node #942\nlead_time <= 134.5\ngini = 0.39\nsamples = 33\nvalue = [10.438, 28.844]\nclass = y[1]'), Text(0.14261551392013866, 0.4583333333333333, 'node #943\ngini = 0.423\nsamples = 17\nvalue = [5.964, 13.663]\nclass = y[1]'), Text(0.14328372233852066, 0.4583333333333333, 'node #944\ngini = 0.352\nsamples = 16\nvalue = [4.473, 15.181]\nclass = y[1]'), Text(0.14328372233852066, 0.5138888888888888, 'node #945\ngini = 0.0\nsamples = 4\nvalue = [0.0, 6.072]\nclass = y[1]'), Text(0.14428603496609366, 0.5416666666666666, 'node #946\narrival_month <= 7.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.14395193075690266, 0.5138888888888888, 'node #947\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14462013917528463, 0.5138888888888888, 'node #948\ngini = 0.442\nsamples = 2\nvalue = [0.746, 1.518]\nclass = y[1]'), Text(0.14428603496609366, 0.5694444444444444, 'node #949\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14462013917528463, 0.5972222222222222, 'node #950\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14495424338447563, 0.625, 'node #951\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.14562245180285763, 0.6805555555555556, 'node #952\navg_price_per_room <= 72.5\ngini = 0.049\nsamples = 20\nvalue = [0.746, 28.844]\nclass = y[1]'), Text(0.14528834759366663, 0.6527777777777778, 'node #953\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14595655601204863, 0.6527777777777778, 'node #954\ngini = 0.0\nsamples = 19\nvalue = [0.0, 28.844]\nclass = y[1]'), Text(0.14729297284881263, 0.7083333333333334, 'node #955\navg_price_per_room <= 74.125\ngini = 0.384\nsamples = 41\nvalue = [26.094, 9.109]\nclass = y[0]'), Text(0.14695886863962163, 0.6805555555555556, 'node #956\narrival_month <= 6.5\ngini = 0.478\nsamples = 14\nvalue = [5.964, 9.109]\nclass = y[1]'), Text(0.14662476443043063, 0.6527777777777778, 'node #957\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.14729297284881263, 0.6527777777777778, 'node #958\navg_price_per_room <= 65.25\ngini = 0.372\nsamples = 10\nvalue = [2.982, 9.109]\nclass = y[1]'), Text(0.14695886863962163, 0.625, 'node #959\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14762707705800363, 0.625, 'node #960\nlead_time <= 145.5\ngini = 0.317\nsamples = 9\nvalue = [2.237, 9.109]\nclass = y[1]'), Text(0.14729297284881263, 0.5972222222222222, 'node #961\narrival_month <= 9.0\ngini = 0.242\nsamples = 8\nvalue = [1.491, 9.109]\nclass = y[1]'), Text(0.14695886863962163, 0.5694444444444444, 'node #962\ntotal_nights <= 5.5\ngini = 0.372\nsamples = 5\nvalue = [1.491, 4.554]\nclass = y[1]'), Text(0.14662476443043063, 0.5416666666666666, 'node #963\ngini = 0.0\nsamples = 2\nvalue = [0.0, 3.036]\nclass = y[1]'), Text(0.14729297284881263, 0.5416666666666666, 'node #964\nlead_time <= 126.5\ngini = 0.5\nsamples = 3\nvalue = [1.491, 1.518]\nclass = y[1]'), Text(0.14695886863962163, 0.5138888888888888, 'node #965\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.14762707705800363, 0.5138888888888888, 'node #966\ngini = 0.0\nsamples = 2\nvalue = [1.491, 0.0]\nclass = y[0]'), Text(0.14762707705800363, 0.5694444444444444, 'node #967\ngini = 0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.14796118126719462, 0.5972222222222222, 'node #968\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.14762707705800363, 0.6805555555555556, 'node #969\ngini = -0.0\nsamples = 27\nvalue = [20.13, 0.0]\nclass = y[0]'), Text(0.14929759810395862, 0.7361111111111112, 'node #970\nroom_type_reserved_Room_Type 5 <= 0.5\ngini = 0.301\nsamples = 60\nvalue = [40.26, 9.109]\nclass = y[0]'), Text(0.14896349389476762, 0.7083333333333334, 'node #971\navg_price_per_room <= 57.25\ngini = 0.183\nsamples = 57\nvalue = [40.26, 4.554]\nclass = y[0]'), Text(0.14862938968557662, 0.6805555555555556, 'node #972\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.14929759810395862, 0.6805555555555556, 'node #973\narrival_date <= 27.5\ngini = 0.13\nsamples = 56\nvalue = [40.26, 3.036]\nclass = y[0]'), Text(0.14896349389476762, 0.6527777777777778, 'node #974\ngini = 0.0\nsamples = 26\nvalue = [19.384, 0.0]\nclass = y[0]'), Text(0.14963170231314962, 0.6527777777777778, 'node #975\ntotal_nights <= 3.5\ngini = 0.222\nsamples = 30\nvalue = [20.875, 3.036]\nclass = y[0]'), Text(0.14896349389476762, 0.625, 'node #976\ntype_of_meal_plan_Meal Plan 2 <= 0.5\ngini = 0.161\nsamples = 22\nvalue = [15.657, 1.518]\nclass = y[0]'), Text(0.14862938968557662, 0.5972222222222222, 'node #977\ngini = 0.0\nsamples = 8\nvalue = [5.964, 0.0]\nclass = y[0]'), Text(0.14929759810395862, 0.5972222222222222, 'node #978\ngini = 0.234\nsamples = 14\nvalue = [9.692, 1.518]\nclass = y[0]'), Text(0.1502999107315316, 0.625, 'node #979\navg_price_per_room <= 79.0\ngini = 0.349\nsamples = 8\nvalue = [5.219, 1.518]\nclass = y[0]'), Text(0.14996580652234062, 0.5972222222222222, 'node #980\ngini = 0.0\nsamples = 1\nvalue = [0.746, 0.0]\nclass = y[0]'), Text(0.1506340149407226, 0.5972222222222222, 'node #981\ngini = 0.378\nsamples = 7\nvalue = [4.473, 1.518]\nclass = y[0]'), Text(0.14963170231314962, 0.7083333333333334, 'node #982\ngini = -0.0\nsamples = 3\nvalue = [0.0, 4.554]\nclass = y[1]'), Text(0.1431949759079543, 0.7916666666666666, 'node #983\ngini = 0.0\nsamples = 65\nvalue = [48.461, 0.0]\nclass = y[0]'), Text(0.2399767716044437, 0.9027777777777778, 'node #984\nlead_time <= 13.5\ngini = 0.426\nsamples = 5272\nvalue = [1866.861, 4202.144]\nclass = y[1]'), Text(0.18240066925249404, 0.875, 'node #985\navg_price_per_room <= 99.445\ngini = 0.472\nsamples = 1413\nvalue = [808.924, 497.942]\nclass = y[0]'), Text(0.16937125764132868, 0.8472222222222222, 'node #986\narrival_month <= 1.5\ngini = 0.348\nsamples = 699\nvalue = [456.278, 132.076]\nclass = y[0]'), Text(0.16903715343213768, 0.8194444444444444, 'node #987\ngini = 0.0\nsamples = 124\nvalue = [92.448, 0.0]\nclass = y[0]'), Text(0.16970536185051968, 0.8194444444444444, 'node #988\narrival_month <= 8.5\ngini = 0.391\nsamples = 575\nvalue = [363.829, 132.076]\nclass = y[0]'), Text(0.1617977938681437, 0.7916666666666666, 'node #989\ntotal_nights <= 2.5\ngini = 0.466\nsamples = 341\nvalue = [197.571, 115.377]\nclass = y[0]'), Text(0.15763454219892772, 0.7638888888888888, 'node #990\nlead_time <= 5.5\ngini = 0.417\nsamples = 262\nvalue = [161.785, 68.315]\nclass = y[0]'), Text(0.15411078686761642, 0.7361111111111112, 'node #991\nno_of_adults <= 1.5\ngini = 0.342\nsamples = 190\nvalue = [124.507, 34.917]\nclass = y[0]'), Text(0.1526386401958686, 0.7083333333333334, 'node #992\narrival_month <= 2.5\ngini = 0.063\nsamples = 62\nvalue = [45.479, 1.518]\nclass = y[0]'), Text(0.1523045359866776, 0.6805555555555556, 'node #993\nlead_time <= 0.5\ngini = 0.149\nsamples = 24\nvalue = [17.148, 1.518]\nclass = y[0]'), Text(0.1519704317774866, 0.6527777777777778, 'node #994\narrival_date <= 12.0\ngini = 0.301\nsamples = 10\nvalue = [6.71, 1.518]\nclass = y[0]'), Text(0.1516363275682956, 0.625, 'node #995\navg_price_per_room <= 82.0\ngini = 0.447\nsamples = 5\nvalue = [2.982, 1.518]\nclass = y[0]'), Text(0.1513022233591046, 0.5972222222222222, 'node #996\ngini = 0.0\nsamples = 4\nvalue = [2.982, 0.0]\nclass = y[0]'), Text(0.1519704317774866, 0.5972222222222222, 'node #997\ngini = 0.0\nsamples = 1\nvalue = [0.0, 1.518]\nclass = y[1]'), Text(0.1523045359866776, 0.625, 'node #998\ngini = 0.0\nsamples = 5\nvalue = [3.728, 0.0]\nclass = y[0]'), Text(0.1526386401958686, 0.6527777777777778, 'node #999\ngini = -0.0\nsamples = 14\nvalue = [10.438, 0.0]\nclass = y[0]'), ...]
#Top-10 most important features in the decision tree
#The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature
print(pd.DataFrame(decisiontree.feature_importances_, columns=["Imp"], index=X_train.columns).sort_values(
by="Imp", ascending=False).head(n=10))
Imp lead_time 0.36 avg_price_per_room 0.15 market_segment_type_Online 0.09 arrival_date 0.09 no_of_special_requests 0.09 arrival_month 0.07 total_nights 0.06 no_of_adults 0.03 arrival_year 0.02 market_segment_type_Offline 0.01
#visualization of feature importance
importances = decisiontree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
<Figure size 1200x1200 with 0 Axes>
Text(0.5, 1.0, 'Feature Importances')
<BarContainer object of 26 artists>
([<matplotlib.axis.YTick at 0x7cf38840e170>, <matplotlib.axis.YTick at 0x7cf38840cee0>, <matplotlib.axis.YTick at 0x7cf38840ceb0>, <matplotlib.axis.YTick at 0x7cf37b57c100>, <matplotlib.axis.YTick at 0x7cf37b57fd90>, <matplotlib.axis.YTick at 0x7cf37b57cb50>, <matplotlib.axis.YTick at 0x7cf37b57c280>, <matplotlib.axis.YTick at 0x7cf37b57cf10>, <matplotlib.axis.YTick at 0x7cf3823cba90>, <matplotlib.axis.YTick at 0x7cf3823cb970>, <matplotlib.axis.YTick at 0x7cf37b57c7c0>, <matplotlib.axis.YTick at 0x7cf3823ca020>, <matplotlib.axis.YTick at 0x7cf37b57c520>, <matplotlib.axis.YTick at 0x7cf3823cbf40>, <matplotlib.axis.YTick at 0x7cf37b911870>, <matplotlib.axis.YTick at 0x7cf3823cad10>, <matplotlib.axis.YTick at 0x7cf37b57e980>, <matplotlib.axis.YTick at 0x7cf3813fd810>, <matplotlib.axis.YTick at 0x7cf3813fd480>, <matplotlib.axis.YTick at 0x7cf3813fd420>, <matplotlib.axis.YTick at 0x7cf3813fdba0>, <matplotlib.axis.YTick at 0x7cf3823c9e40>, <matplotlib.axis.YTick at 0x7cf3813fceb0>, <matplotlib.axis.YTick at 0x7cf3813fd270>, <matplotlib.axis.YTick at 0x7cf3813fe770>, <matplotlib.axis.YTick at 0x7cf3813ff550>], [Text(0, 0, 'market_segment_type_Complementary'), Text(0, 1, 'type_of_meal_plan_Meal Plan 3'), Text(0, 2, 'room_type_reserved_Room_Type 3'), Text(0, 3, 'no_of_previous_bookings_not_canceled'), Text(0, 4, 'no_of_previous_cancellations'), Text(0, 5, 'room_type_reserved_Room_Type 7'), Text(0, 6, 'market_segment_type_Corporate'), Text(0, 7, 'room_type_reserved_Room_Type 6'), Text(0, 8, 'repeated_guest'), Text(0, 9, 'room_type_reserved_Room_Type 5'), Text(0, 10, 'room_type_reserved_Room_Type 2'), Text(0, 11, 'type_of_meal_plan_Meal Plan 2'), Text(0, 12, 'no_of_children'), Text(0, 13, 'type_of_meal_plan_Not Selected'), Text(0, 14, 'room_type_reserved_Room_Type 4'), Text(0, 15, 'required_car_parking_space'), Text(0, 16, 'market_segment_type_Offline'), Text(0, 17, 'arrival_year'), Text(0, 18, 'no_of_adults'), Text(0, 19, 'total_nights'), Text(0, 20, 'arrival_month'), Text(0, 21, 'no_of_special_requests'), Text(0, 22, 'arrival_date'), Text(0, 23, 'market_segment_type_Online'), Text(0, 24, 'avg_price_per_room'), Text(0, 25, 'lead_time')])
Text(0.5, 0, 'Relative Importance')
lead_time is the highest variable in the feature imprtance.
Using GridSearch for Hyperparameter tuning of our tree model
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# Grid of parameters to choose from
parameters = {
"max_depth": np.arange(4, 13, 4), # [4, 8, 12]
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
"min_impurity_decrease": [0.00001, 0.0001, 0.01, .1, 1],
"max_leaf_nodes": [50, 75, 150, 250],
"min_samples_split": [10, 30, 50, 70],
}
# Type of scoring used to compare parameter combinations
accuracy_scorer = make_scorer(recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=accuracy_scorer, cv=5)
grid_obj = grid_obj.fit(X_train1, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight='balanced', criterion='entropy',
max_depth=4, max_leaf_nodes=50,
min_impurity_decrease=1e-05, min_samples_split=10,
random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. DecisionTreeClassifier(class_weight='balanced', criterion='entropy',
max_depth=4, max_leaf_nodes=50,
min_impurity_decrease=1e-05, min_samples_split=10,
random_state=1)confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_tune_perf_train = model_performance_classification_sklearn(
estimator, X_train, y_train
)
decision_tree_tune_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81 | 0.72 | 0.70 | 0.71 |
confusion_matrix_sklearn(estimator, X_test, y_test)
decision_tree_tune_perf_test = model_performance_classification_sklearn(
estimator, X_test, y_test
)
decision_tree_tune_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.81 | 0.72 | 0.71 | 0.71 |
The training and the test have similar results, however the F1 scores are .71 and .71 which is very low.
Also the recall score is 0.72 and 0.72.
Recall on training set went from .98 to .72, but this is an improvement because now the model has less overfitting.
There is still more work to do.
Visualing the Decison Tree
plt.figure(figsize=(35, 10))
out = tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
<Figure size 3500x1000 with 0 Axes>
This decisontree looks better than the previous one.
Still has several nodes but way easier to read.
#Print the top-10 most important features in the decision tree
#The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature
print(pd.DataFrame(estimator.feature_importances_, columns=["Imp"], index=X_train.columns).sort_values(
by="Imp", ascending=False).head(n=10))
Imp lead_time 0.46 market_segment_type_Online 0.22 no_of_special_requests 0.20 avg_price_per_room 0.08 arrival_month 0.02 market_segment_type_Offline 0.01 type_of_meal_plan_Not Selected 0.00 market_segment_type_Corporate 0.00 market_segment_type_Complementary 0.00 room_type_reserved_Room_Type 7 0.00
The pre-pruned decision tree model shows lead_time and market_segment_type_online are the two most important variables for predicting a booking's cancellation.
The third most important variable, no_of_special_requests.
Cost Complexity Pruning
Minimal cost complexity pruning identifies the node with the ‘weakest link’ in a decision tree. This ‘weakest link’ is characterized by an effective alpha, where nodes with the smallest effective alpha are pruned first. To determine suitable values for the pruning parameter (ccp_alpha), scikit-learn provides the DecisionTreeClassifier.cost_complexity_pruning_path function. This function returns the effective alphas and corresponding total leaf impurities at each step of the pruning process. As the alpha value increases, more of the tree is pruned, leading to increased total impurity in its leaves.
In summary, cost complexity pruning helps control the size of decision trees by selectively removing nodes based on their impact on model complexity and impurity reduction. By adjusting the ccp_alpha parameter, you can strike a balance between model accuracy and simplicity.
# Set the classifier first
cif = DecisionTreeClassifier(random_state=1, class_weight="balanced")
# Compute the pruning for training data
path = cif.cost_complexity_pruning_path(X_train, y_train)
# Come up with all the ccp alphas and corresponding impurities
ccp_alphas, impurities = path.ccp_alphas, path.impurities
'''
Explanation of ccp_alphas: These values represent different thresholds of cost complexity.
Each value corresponds to a point where a split in the decision tree will be pruned if it doesn't improve the model's overall complexity cost by at least that amount.
The array is sorted in increasing order. Starting with the smallest alpha (the least complex tree),
each subsequent alpha increases the penalty for complexity, resulting in a simpler (more pruned) tree.
The goal is to find the ccp_alpha value that maximizes performance on the validation or test set, which may not necessarily be the highest ccp_alpha.
The optimal ccp_alpha achieves the best trade-off between overfitting and underfitting, leading to a model that generalizes well to new data.
Explanation of impurities: This array provides the total impurity of the tree at each level of pruning defined by ccp_alphas.
Impurity is a measure of how mixed the classes are in the leaves of the tree. As pruning increases (with larger ccp_alpha values),
the impurity might initially decrease, as overfitting reduces, but then can increase if the model becomes too simple and underfits.
'''
"\nExplanation of ccp_alphas: These values represent different thresholds of cost complexity.\nEach value corresponds to a point where a split in the decision tree will be pruned if it doesn't improve the model's overall complexity cost by at least that amount.\nThe array is sorted in increasing order. Starting with the smallest alpha (the least complex tree),\neach subsequent alpha increases the penalty for complexity, resulting in a simpler (more pruned) tree.\n\n\nThe goal is to find the ccp_alpha value that maximizes performance on the validation or test set, which may not necessarily be the highest ccp_alpha.\nThe optimal ccp_alpha achieves the best trade-off between overfitting and underfitting, leading to a model that generalizes well to new data.\n\n\nExplanation of impurities: This array provides the total impurity of the tree at each level of pruning defined by ccp_alphas.\nImpurity is a measure of how mixed the classes are in the leaves of the tree. As pruning increases (with larger ccp_alpha values),\nthe impurity might initially decrease, as overfitting reduces, but then can increase if the model becomes too simple and underfits.\n\n"
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.00 | 0.01 |
| 1 | 0.00 | 0.01 |
| 2 | 0.00 | 0.01 |
| 3 | 0.00 | 0.01 |
| 4 | 0.00 | 0.01 |
| ... | ... | ... |
| 1843 | 0.01 | 0.33 |
| 1844 | 0.01 | 0.34 |
| 1845 | 0.01 | 0.35 |
| 1846 | 0.03 | 0.42 |
| 1847 | 0.08 | 0.50 |
1848 rows × 2 columns
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("Total impurity of leaves")
ax.set_title("Total impurity vs Effective alpha for training set")
plt.show()
[<matplotlib.lines.Line2D at 0x7cf37b481d20>]
Text(0.5, 0, 'effective alpha')
Text(0, 0.5, 'Total impurity of leaves')
Text(0.5, 1.0, 'Total impurity vs Effective alpha for training set')
Next, we train a decision tree using the effective alphas. The last value in ccp_alphas is the alpha value that prunes the whole tree, leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha, class_weight="balanced")
clf.fit(X_train, y_train) ## Complete the code to fit decision tree on training data
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1])
)
Output hidden; open in https://colab.research.google.com to view.
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node. Here we show that the number of nodes and tree depth decreases as alpha
increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("Alpha")
ax[0].set_ylabel("Number of nodes")
ax[0].set_title("Number of nodes vs Alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("Alpha")
ax[1].set_ylabel("Depth of tree")
ax[1].set_title("Depth vs Alpha")
fig.tight_layout()
[<matplotlib.lines.Line2D at 0x7cf37b242f50>]
Text(0.5, 0, 'Alpha')
Text(0, 0.5, 'Number of nodes')
Text(0.5, 1.0, 'Number of nodes vs Alpha')
[<matplotlib.lines.Line2D at 0x7cf37b2ff130>]
Text(0.5, 0, 'Alpha')
Text(0, 0.5, 'Depth of tree')
Text(0.5, 1.0, 'Depth vs Alpha')
f1_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = f1_score(y_train, pred_train)
f1_train.append(values_train)
f1_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = f1_score(y_test, pred_test)
f1_test.append(values_test)
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("F1")
ax.set_title("F1 vs Alpha for training and testing sets")
ax.plot(
ccp_alphas, f1_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, f1_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
Text(0.5, 0, 'alpha')
Text(0, 0.5, 'F1')
Text(0.5, 1.0, 'F1 vs Alpha for training and testing sets')
[<matplotlib.lines.Line2D at 0x7cf381af3070>]
[<matplotlib.lines.Line2D at 0x7cf37b394250>]
<matplotlib.legend.Legend at 0x7cf3887d94e0>
The F1 score for the training and testing are lining up almost perfectly.
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
Text(0.5, 0, 'alpha')
Text(0, 0.5, 'Recall')
Text(0.5, 1.0, 'Recall vs alpha for training and testing sets')
[<matplotlib.lines.Line2D at 0x7cf37a5b9570>]
[<matplotlib.lines.Line2D at 0x7cf37a5bb340>]
<matplotlib.legend.Legend at 0x7cf37a5ba8f0>
The Recall score for the training and testing are lining up almost perfectly.
#create the model where we get highest train and test recall
index_post = np.argmax(f1_test)
decisiontree_post = clfs[index_post]
print(decisiontree_post)
DecisionTreeClassifier(ccp_alpha=0.00012535266224369257,
class_weight='balanced', random_state=1)
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.0001547772202137408, class_weight='balanced',
random_state=1)
decisiontree_post.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=0.00012535266224369257,
class_weight='balanced', random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. DecisionTreeClassifier(ccp_alpha=0.00012535266224369257,
class_weight='balanced', random_state=1)Performance on Training Set
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_post_perf_train = model_performance_classification_sklearn(
best_model, X_train, y_train
)
decision_tree_post_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.88 | 0.89 | 0.77 | 0.83 |
Performance on Test Set
confusion_matrix_sklearn(best_model, X_test, y_test)
decision_tree_post_perf_test = model_performance_classification_sklearn(
best_model, X_test, y_test
)
decision_tree_post_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.86 | 0.86 | 0.74 | 0.79 |
The Fi score is 0.83 on the training set compared to 0.79 on the test set.
The Recall score is 0.89 on the training set compared to 0.86 on the test set.
plt.figure(figsize=(20, 10))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
<Figure size 2000x1000 with 0 Axes>
Tree is not getting less complex.
#Print the top-10 most important features in the decision tree
#The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature
importances = decisiontree_post.feature_importances_
indices = np.argsort(importances)
print(pd.DataFrame(decisiontree_post.feature_importances_, columns=["Imp"], index=X_train.columns).sort_values(
by="Imp", ascending=False).head(n=10))
Imp lead_time 0.40 market_segment_type_Online 0.14 no_of_special_requests 0.12 avg_price_per_room 0.12 arrival_month 0.06 arrival_date 0.04 total_nights 0.03 no_of_adults 0.03 arrival_year 0.02 market_segment_type_Offline 0.01
#visualization of feature importance
importances = decisiontree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="aqua", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
<Figure size 1200x1200 with 0 Axes>
Text(0.5, 1.0, 'Feature Importances')
<BarContainer object of 26 artists>
([<matplotlib.axis.YTick at 0x7cf3815c7a90>, <matplotlib.axis.YTick at 0x7cf3815c4a90>, <matplotlib.axis.YTick at 0x7cf381b25c30>, <matplotlib.axis.YTick at 0x7cf388a21810>, <matplotlib.axis.YTick at 0x7cf37b54c490>, <matplotlib.axis.YTick at 0x7cf388a20760>, <matplotlib.axis.YTick at 0x7cf38826ca30>, <matplotlib.axis.YTick at 0x7cf37bd49270>, <matplotlib.axis.YTick at 0x7cf37bd49ab0>, <matplotlib.axis.YTick at 0x7cf37b98c760>, <matplotlib.axis.YTick at 0x7cf37bd4a740>, <matplotlib.axis.YTick at 0x7cf38826f370>, <matplotlib.axis.YTick at 0x7cf37b54fc10>, <matplotlib.axis.YTick at 0x7cf37b9c6dd0>, <matplotlib.axis.YTick at 0x7cf37b9c7550>, <matplotlib.axis.YTick at 0x7cf37b972a70>, <matplotlib.axis.YTick at 0x7cf37b9c7eb0>, <matplotlib.axis.YTick at 0x7cf37b972c80>, <matplotlib.axis.YTick at 0x7cf37b6866e0>, <matplotlib.axis.YTick at 0x7cf37b686380>, <matplotlib.axis.YTick at 0x7cf37bac16f0>, <matplotlib.axis.YTick at 0x7cf37b686da0>, <matplotlib.axis.YTick at 0x7cf38826dfc0>, <matplotlib.axis.YTick at 0x7cf37b685fc0>, <matplotlib.axis.YTick at 0x7cf37b5befb0>, <matplotlib.axis.YTick at 0x7cf37b5be410>], [Text(0, 0, 'market_segment_type_Complementary'), Text(0, 1, 'type_of_meal_plan_Meal Plan 3'), Text(0, 2, 'room_type_reserved_Room_Type 3'), Text(0, 3, 'no_of_previous_bookings_not_canceled'), Text(0, 4, 'no_of_previous_cancellations'), Text(0, 5, 'room_type_reserved_Room_Type 7'), Text(0, 6, 'market_segment_type_Corporate'), Text(0, 7, 'room_type_reserved_Room_Type 6'), Text(0, 8, 'repeated_guest'), Text(0, 9, 'room_type_reserved_Room_Type 5'), Text(0, 10, 'room_type_reserved_Room_Type 2'), Text(0, 11, 'type_of_meal_plan_Meal Plan 2'), Text(0, 12, 'no_of_children'), Text(0, 13, 'type_of_meal_plan_Not Selected'), Text(0, 14, 'room_type_reserved_Room_Type 4'), Text(0, 15, 'required_car_parking_space'), Text(0, 16, 'market_segment_type_Offline'), Text(0, 17, 'arrival_year'), Text(0, 18, 'no_of_adults'), Text(0, 19, 'total_nights'), Text(0, 20, 'arrival_month'), Text(0, 21, 'no_of_special_requests'), Text(0, 22, 'arrival_date'), Text(0, 23, 'market_segment_type_Online'), Text(0, 24, 'avg_price_per_room'), Text(0, 25, 'lead_time')])
Text(0.5, 0, 'Relative Importance')
# Choose the type of classifier.
estimator_updated = DecisionTreeClassifier(random_state=1) # random forest, xgboost, svm
# Grid of parameters to choose from
parameters = {
"class_weight": [None, "balanced"],
"max_depth": np.arange(3, 13, 3), # [3, 9, 12)
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
"min_impurity_decrease": [0.00001, 0.0001, 0.01, 0.1, 1],
"max_leaf_nodes": [50, 75, 150, 250],
"min_samples_split": [10, 30, 50, 70],
}
# Type of scoring used to compare parameter combinations
scorer = make_scorer(f1_score)
# Run the grid search
grid_obj = GridSearchCV(estimator_updated, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator_updated = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator_updated.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=12, max_leaf_nodes=250,
min_impurity_decrease=0.0001, min_samples_split=10,
random_state=1)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. DecisionTreeClassifier(max_depth=12, max_leaf_nodes=250,
min_impurity_decrease=0.0001, min_samples_split=10,
random_state=1)confusion_matrix_sklearn(estimator_updated, X_train, y_train)
decision_tree_tune_perf_train_updated = model_performance_classification_sklearn(
estimator_updated, X_train, y_train
)
decision_tree_tune_perf_train_updated
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.89 | 0.81 | 0.84 | 0.83 |
confusion_matrix_sklearn(estimator_updated, X_test, y_test)
decision_tree_tune_perf_test_updated = model_performance_classification_sklearn(
estimator_updated, X_test, y_test
)
decision_tree_tune_perf_test_updated
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.88 | 0.79 | 0.83 | 0.81 |
plt.figure(figsize=(35, 10))
out = tree.plot_tree(
estimator_updated,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
<Figure size 3500x1000 with 0 Axes>
The model has a better F1 score on both the training and testing datasets than the logistic regression models.
This is a simpler model and appears to perform similarly well on both the training and test dataset, indicating that this model is not overfit to the training data and thereby should provide more generalizable predictions.
# training performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train_without.T,
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_post_perf_train.T,
decision_tree_tune_perf_train_updated.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree without class_weight",
"Decision Tree with class_weight",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
"Decision Tree (Readible Tree)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree without class_weight | Decision Tree with class_weight | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | Decision Tree (Readible Tree) | |
|---|---|---|---|---|---|
| Accuracy | 0.99 | 0.99 | 0.81 | 0.88 | 0.89 |
| Recall | 0.99 | 1.00 | 0.72 | 0.89 | 0.81 |
| Precision | 1.00 | 0.98 | 0.70 | 0.77 | 0.84 |
| F1 | 0.99 | 0.99 | 0.71 | 0.83 | 0.83 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
decision_tree_perf_test_without.T,
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_post_perf_test.T,
decision_tree_tune_perf_test_updated.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Decision Tree without class_weight",
"Decision Tree with class_weight",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
"Decision Tree (Readible Tree)",
]
print("Testing performance comparison:")
models_test_comp_df
Testing performance comparison:
| Decision Tree without class_weight | Decision Tree with class_weight | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | Decision Tree (Readible Tree) | |
|---|---|---|---|---|---|
| Accuracy | 0.87 | 0.86 | 0.81 | 0.86 | 0.88 |
| Recall | 0.80 | 0.81 | 0.72 | 0.86 | 0.79 |
| Precision | 0.79 | 0.78 | 0.71 | 0.74 | 0.83 |
| F1 | 0.80 | 0.79 | 0.71 | 0.79 | 0.81 |
We conducted an analysis of 36,275 booking cancellation decisions using five different Decision Tree classifiers to create a predictive model. These models can assist INN Hotels Group in predicting whether a booking will be canceled before the check-in date.
All five decision-tree models outperform the best-performing logistic regression model based on our objective criterion (F1 score), 0.69 pm the logistic regression test model and 0.71 (the lowest) decision tree (Pre-Pruning).
We visualized each model's decision tree and confusion matrix for better understanding. However, interpreting predictions from the original, pre-pruned, and post-pruned decision-tree models may be challenging for clients. For instance, the original and post-pruned decision tree is visually complex. The pre-pruned decision tree is slightly complex but is able to be read.
Despite efforts to reduce overfitting through tuning, both pre-pruning and post-pruning methods had minimal impact. Pre-pruning decisiontree looks better but could still have some overfitting that occurs.
The best-performing model (based on Recall), the post-pruned decision tree, has minimal performance gap between the training and test datasets:
The best-performing model (based on F1-Scre), the readible decision tree, has minimal performance gap between the training and test datasets:
INN Hotels should weigh the tradeoff between model performance, overfitting, and interpretability.
If a more understandable prediction model is desired, a max tree depth of 12 is recommended.
Alternatively, if INN Hotels prioritizes performance and is comfortable with a “black-box” model, either the post-pruning or the readible tree is a suitable choice.
Our EDA and predictions from both models show:
Guest are less likely to cancel if:
Guest are more likely to cancel if:
What profitable policies for cancellations and refunds can the hotel adopt?
Considering the coefficients in the logistic regression models and the features in the decision-tree models, both prediction models suggest that INN Hotels should contemplate implementing distinct cancellation and refund policies for guests traveling for either business or personal reasons.
INN Hotel should implement more/better incentives for corporate guests. Currently only 30% of corporate bookings are from repeat customers. Offering a reward for chosing an INN Hotel could further incentivize the corporate guest to stay with INN Hotel vs a competitor.
Repeated guests are very important as they cancel less than other guests. However, currently repeat guests only account for .3% of all guests. Researched should be done to determine how to increase these numbers. Incentives/Loyalty programs should be introduced to increase these percentages.
Moreover, if a hotel reaches full capacity or experiences overbooking, management can leverage the model to ensure that rooms remain available for repeat guests or business travelers.
By combining predictions from both models, management can identify the most probable scenarios for booking cancellations and allocate those rooms to the least likely cases within the same room category.
What other recommedations would you suggest to the hotel?
INN Hotel should implement more/better incentives for online, guests. Currently only .4% of all bookings are from repeated customers. More research should be done on why guests are not chosing to stay more often at an INN hotel. A comparison should be done between INN hotel prices vs their competitors.
INN Hotel should implement more/better incentives for offline, guests. Currently only .8% of all bookings are from repeated customers. More research should be done on why guests are not chosing to stay more often at an INN hotel. A comparison should be done between INN hotel prices vs their competitors.
The costs associated with true and false positives and negatives should be calculated. If this is done the models can be enhanced to maximize expecited profits and predict expected loses. This would be extremely beneficial to management who attempt to lessen their loses.
More research should be done to understand why so many bookings are being canceled. More data and further analysis is needed to determine the cause.
Based on our data analysis, a clear seasonal pattern emerges in booking behavior.
As a result INN Hotels should determine how to boost sells for the Winter months. They can offer appealing deals that attract more customers and result in higher occupancy rates.
This information can also be used to allocate resources based on potential booking rates.
%%shell
jupyter nbconvert --to html //'/content/drive/MyDrive/Python_Course/Project_4/Project_SLC_DSBA_INNHotels_FullCode_Balance.ipynb'
[NbConvertApp] Converting notebook ///content/drive/MyDrive/Python_Course/Project_4/Project_SLC_DSBA_INNHotels_FullCode_Balance.ipynb to html [NbConvertApp] Writing 10056902 bytes to /content/drive/MyDrive/Python_Course/Project_4/Project_SLC_DSBA_INNHotels_FullCode_Balance.html